I am looking to write an interpreted language in C#, where should I start? I know how I would do it using fun string parsing, but what is the correct way?
It can be a pretty difficult endeavour to do right.
If you don't have much knowledge in compiler theory you should probably first start reading about it.
Just using "fun string parsing", if I understand that term correctly, isn't going to get you very far at all.
The first basic step is to write your language grammar that defines the valid syntax for the language.
A tool like ANTLR will help you get the pieces together, but I would suggest reading the Dragon book as it is the canonical starting point to get up to speed on the subject.
If you want to build an interpreted language on .NET, the DLR is the way to go - check out Martin Maly's LOLCODE sample at http://www.iunknown.com/2007/11/lolcode-on-dlr.html
Edit: Here's another link with more information from Scott Hanselman: http://www.hanselman.com/blog/TheWeeklySourceCode11LOLCodeDLREdition.aspx
Checkout the Phoenix compiler from Microsoft. This will provide many of the tools you will need to build a compiler targeting native or managed environments. Among these tools us a optimizing back end.
I second Cycnus' suggestion on reading Aho Sethi and Ullman's "Dragon Book" (Wikipedia, Amazon).
RGR
Related
Throughout this site I commonly see people answer questions with such answers as "it works like that because the compiler replaces [thing] with [other thing]", so my question is, how do people know/learn this? Where can I learn these things?
The most definitive source for how the C# compiler interprets code is the C# language spec.
http://www.microsoft.com/download/en/details.aspx?id=7029
Also the following blogs provide a lot of more insight into the C# language. Mandatory reading for anyone who wants to become an expert in the language
http://blogs.msdn.com/b/ericlippert/
http://msmvps.com/blogs/jon_skeet/
One technique is to compile your code, and then decompile it using tools such as ILSpy. Using such a tool, you can view the raw IL and see for yourself what the compiler produces.
In addition to the other answers, I'd like to mention that LINQPad is my favorite tool for inspecting IL for quick snippets.
You can type a snippet of code, and immediately see the IL.
It's by far the easiest tool to use, and you can make changes and see the results instantly.
In addition to checking the Intermediate Language and reading the language specification, please allow me to add "CLR via C#" by Jeffrey Richter. Microsoft Press Library of Congress Control Number: 2009943026. This reference is amazing, and goes into complete detail on what's happening under the covers.
Niklaus Wirth's book Compiler Construction (PDF) is an introduction to the theory and the techniques of compiler construction. It gives you a general idea of what a compiler is and what it does.
I am researching ways, tools and techniques to parse code files in order to support syntax highlighting and intellisence in an editor written in c#.
Does anyone have any ideas/patterns & practices/tools/techiques for that.
EDIT: A nice source of info for anyone interested:
Parsing beyond Context-free grammars
ISBN 978-3-642-14845-3
My favourite parser for C# is Irony: http://irony.codeplex.com/ - i have used it a couple of times with great success
Here is a wikipedia page listing many more: http://en.wikipedia.org/wiki/Compiler-compiler
There are two basic aproaches:
1) Parse the entire solution and everything it references so you understand all the types involved in the code
2) Parse locally and do your best to guess what types etc are.
The trouble with (2) is that you have to guess, and in some circumstances you just can't tell from a code snippet exactly what everything is. But if you're happy with the sort oif syntax highlighting shown on (e.g.) Stack Overflow, then this approach is easy and quite effective.
To do (1) then you need to do one of (in decreasing order of difficulty):
Parse all the source code. Not possible if you reference 3rd party assemblies.
Use reflection on the compiled code to garner type information you can use when parsing the source.
Use the host IDE's (if avaiable - so not applicable in your case!) code element interfaces to provide the information you need
You could take a look at how http://www.icsharpcode.net/ did it. They wrote a book doing just that, Dissecting a C# Application: Inside SharpDevelop, it even has a chapter called
Implement a parser to provide syntax
highlighting and auto-completion as
users type
first things first;
I am writing a little LUA-Ide in C#. The code execution is done by an Assembly named LuaInterface. The code-editing is done by a Scintilla-Port & the RAD / UI Interface is via the extensible IDesignSurfaceExt Visual Studio (one way code generation). File handling is provided by a little sql-lite-db used as a project-package-file.
So all in all i've got everything i need together...
The only problem unsolved is the parser / lexer for lua. I do not want to load & execute the code! I just want to parse the String containing the Lua code and get some information about it like function and global vars. I really don't want to write the parser completly myself... (I hate regex - I get the wrong all the time ^^)
Anybody got a link to a .net lua parser lying around?
Just to clarify - I only want to analyse the code at this point - I dont wnat to run it!
Thanks in advance!
Corelgott
Just for the record:
I went with a comibination of:
http://irony.codeplex.com/ - A Language implementation Kit that can be adapted to parse several languages. (Btw. this one got virtually no ducumentation what so ever... So code-comments no docs... but lots of fun...)
and a customized version of
http://luairony.codeplex.com/ - the Lua Syntax for irony (added some degree error tolerance)
But I gotta admin, both are pretty heavy stuff... and you kind of open up a box of new problems as well as lots of possibilities...
Cheers, Corelgott
This SO question's responses may be helpful.
Easiest way to parse a Lua datastructure in C# / .Net
Incomplete but:
http://luairony.codeplex.com/
This isn't quite what you're after, but maybe half of it can provide half the answer.
It converts Lua to C, by parsing the Lua to an AST. You could then extract the info you need from the AST. It's written in Lua, but you already know how to call that :)
Have look here: Lua recipes for LPeg
Maybe you can use one - otherwise I would look at using the extended BNF from the documentation.
I am creating a scripting language to be used to create web pages, but don't know exactly where to begin.
I have a file that looks like this:
mylanguagename(main) {
OnLoad(protected) {
Display(img, text, link);
}
Canvas(public) {
Image img: "Images\my_image.png";
img.Name: "img";
img.Border: "None";
img.BackgroundColor: "Transparent";
img.Position: 10, 10;
Text text: "This is a multiline str#ning. The #n creates a new line.";
text.Name: text;
text.Position: 10, 25;
Link link: "Click here to enlarge img.";
link.Name: "link";
link.Position: 10, 60;
link.Event: link.Clicked;
}
link.Clicked(sender, link, protected) {
Image img: from Canvas.FindElement(img);
img.Size: 300, 300;
}
}
... and I need to be able to make that text above target the Windows Scripting Host. I know this can be done, because there used to be a lot of Docs on it around the net a while back, but I cannot seem to find them now.
Can somebody please help, or get me started in the right direction?
Thanks
You're making a domain-specific language which does not exist. You want to translate to another language. You will need a proper scanner and parser. You've probably been told to look at antlr. yacc/bison, or gold. What went wrong with that?
And as an FYI, it's a fun exercise to make new languages, but before you do for something like this, you might ask a good solid "why? What does my new language provide that I couldn't get any other (reasonable) way?"
The thing to understand about parsing and language creation is that writing a compiler/interpreter is primarily about a set of data transformations done to an input text.
Generally, from an input text you will first translate it into a series of tokens, each token representing a concept in your language or a literal value.
From the token stream, you will generally then create an intermediate structure, typically some kind of tree structure describing the code that was written.
This tree structure can then be validated or modified for various reasons, including optimization.
Once that's done, you'll typically write the tree out to some other form - assembly instructions or even a program in another language - in fact, the earliest versions of C++ wrote out straight C code, which were then compiled by a regular C compiler that had no knowledge of C++ at all. So while skipping the assembly generation step might seem like cheating, it has a long and proud tradition behind it :)
I deliberately haven't gotten into any suggestions for specific libraries, as understanding the overall process is probably much more important than choosing a specific parser technology, for instance. Whether you use lex/yacc or ANTLR or something else is pretty unimportant in the long run. They'll all (basically) work, and have all been used successfully in various projects.
Even doing your own parsing by hand isn't a bad idea, as it will help you to learn the patterns of how parsing is done, and so then using a parser generator will tend to make more sense rather than being a black box of voodoo.
Languages similar to C# are not easy to parse - there are some naturally left-recursive rules. So you have to use a parser generator that can deal with them properly. ANTLR fits well.
If PEG fits better, try this: http://www.meta-alternative.net/mbase.html
So you want to translate C# programs to JavaScript? Script# can do this for you.
Rather than write your own language and then run a translator to convert it into Javascript, why not extend Javascript to do what you want it to do?
Take a look at jQuery - it extends Javascript in many powerful ways with a very natural and fluent syntax. It's almost as good as having your own language. Take a look at the many extensions people have created for it too, especially jQuery UI.
Assuming you are really dedicated to do this, here is the way to go. This is normally what you should do: source -> SCANNER -> tokens -> PARSER -> syntax tree
1) Create a scanner/ parser to parse your language. You need to write a grammar to generate a parser that can scan/parse your syntax, to tokenize/validate them.
I think the easiest way here is to go with Irony, that'll make creating a parser quick and easy. Here is a good starting point
http://www.codeproject.com/KB/recipes/Irony.aspx
2) Build a syntax tree - In this case, I suggest you to build a simple XML representation instead of an actual syntax tree, so that you can later walk the XML representation of your DOM to spit out VB/Java Script. If your requirements are complex (like you want to compile it or so), you can create a DLR Expression Tree or use the Code DOM - but here I guess we are talking about a translator, and not about a compiler.
But hey wait - if it is not for educational purposes, consider representing your 'script' as an xml right from the beginning, so that you can avoid a scanner/parser in between, before spitting out some VB/Java script/Html out of that.
I don't wan to be rude... but why are you doing this?
Creating a parser for a regular language is a non-trivial task. Just don't do it.
Why don't you just use html, javascript and css (and jquery as someone above suggested)
If you don't know where to begin, then you probably don't have any experience of this kind and probably you don't have a good reason, why to do this.
I want to save you the pain. Forget it. It's probably a BAD IDEA!
M.
Check out Constructing Language Processors for Little Languages. It's a very good intro I believe. In fact I just consulted my copy 2 days ago when I was having trouble with my template language parser.
Use XML if at all possible. You don't want to fiddle with a lexer and parser by hand if you want this thing in production. I've made this mistake a few times. You end up supporting code that you really shouldn't be. It seems that your language is mainly a templating language. XML would work great there. Just as ASPX files are XML. Your server side blocks can be written in Javascript, modified if necessary. If this is a learning exercise then do it all by hand, by all means.
I think writing your own language is a great exercise. So is taking a college level compiler writing class. Good luck.
You obviously need machinery designed to translate langauges: parsing, tree building, pattern matching, target-language tree building, target-language prettyprinting.
You can try to do all of this with YACC (or equivalents), but you'll discover that parsing
is only a small part of a full translator. This means there's a lot more work
to do than just parsing, and that takes time and effort.
Our DMS Software Reengineering Toolkit is a commercial solution to building full translators for relatively modest costs.
If you want to do it on your own from the ground up as an exercise, that's fine. Just be prepared for the effort it really takes.
One last remark: designing a complete language is hard if you want to get a nice result.
Personally I think that every self-imposed challenge is good. I do agree with the other opinions that if what you want is a real solution to a real life problem, it's probably better to stick with proved solutions. However, if as you said yourself, you have an academic interest into solving this problem, then I encourage you to keep on. If this is the case, I might point a couple of tips to get you on the track.
Parsing is not really an easy task, that is way we take at least a semester of it. However, it can be learned. I would recommend starting with Terrence Parr's book on language implementation patterns. There are many great books about compiling and parsing, probably the most loved and hated been the Dragon Book.
This is pretty heavy stuff, but if you are really into this, and have the time, you should definitely take a look. This would be the Robisson Crusoe's "i'll make it all by myself approach". I have recently written an LR parser generator and it took me no more than a long weekend, but that after reading a lot and taking a full two-semesters course on compilers.
If you don't have the time or simply don't want to learn to make a parser "like men do", then you can always try a commercial or academic parser generator. ANTLR is just fine, but you have to learn its meta-language. Personally I think that Irony is a great tool, specially because it stays inside C# and you can take a look at the source code and learn for yourself. Since we are here, and I'm not trying to make any advertisement at all, I have posted a tiny tool in CodePlex that could be useful for this task. Take a look for yourself, it's open-source and free.
As a final tip, don't get scared if someone tells you it cannot be done. Parsing is a difficult theoretical problem but it's nothing that can't be learned, and it really is a great tool to have in your portfolio. I think it speaks very good of a developer that he can write an descent-recursive parser by hand, even if he never has to. If you want to pursuit this goal to its end, take a college-level compilers course, you'll thank me in a year.
Any suggestions on how I should approach this? Thanks.
Take a look at this VB to C# Comparison chart for some of the syntax and keyword differences.
I have to do this often - and my biggest hang-up is the semi-colon. Never fails that my first few days of writing VB after a longer stint of C# coding, the VB compiler is always barking at me for putting a semi-colon on every line of VB code.
Other than that, it shouldn't be too painful. If you're fluent in C#, moving to VB might be stressful for the first few days, but after that you should be smooth sailing.
Code converter tools come in handy to help you remember/learn/re-learn all of those odd syntax differences that you forget easily. The one I normally turn to first is http://converter.telerik.com/ - and if that won't do the trick, a quick google search for code converters will turn up a handful of other good ones.
Another pain point that I've had in the past too is Snippets. Snippets in C# rock - but in VB rock a bit less. Get to know the differences between those and life will be much easier. (Come on VB team - get that enter key working like the C# snippet team has it...)
A good C# to VB.NET converter will help.
Aside from revulsion and horror I recommend (from experience - ugh) to just start. Build a simple app. The magic is in the experience. It doesn't make sense until you have spent lots of time trying to figure out why something doesn't work.
I went the other way (VB to C#) and found the syntax to be so similar that the transition was painless. I can now pretty much program in either platform – thanks a lot to the IDE intellisense.
Take advantage of the "With" statement! One of my favorite parts of VB.NET.
It's not as difficult as it seems at first. Took me about a month from going strictly C++\C# to VB to get comfortable.
If you are familiar with programing you should just have to learn the syntax...why would anyone want to go from C# to VB? who knows :)
My first question would be 'Why?'. I'd like to think that you can pretty much get the same thing done with either C# or VB.Net. Given that it's managed code, why not just leave them as they are?
Let's just assume you have your reasons :)
1) There are a couple of tools that will do this (see http://www.developerfusion.com/tools/convert/csharp-to-vb/ for a sample).
2) The other option is to manually convert the code, compile, fix errors, and repeat. Painful.
It's a pretty straight-forward thing, actually. VB.Net is a perfectly good (if, imo, verbose) language with most of the expressiveness you've grown accustomed to in C#. Just be aware that certain specific keywords are different and that you've got a different background culture and you'll do fine.
You can also use a tool like CodeRush from DevExpress (no affiliation). The short-cut keys for any operation are the same for both languages and will produce the correct output for the language.
For example: key combo "mv" yields:
In C#
public void MethodName ()
{
}
In VB
Public Sub MethodName()
End Sub
Use XML literals and marvel how resentful fellow C# programmers suddenly are.
There were some useful articles in Visual Studio magazine back in Jan 2008.
What C# developers should know about VB
And for completeness: what VB developers should know about C#