I'm using CodeDom compiler to dynamically compile user defined scripts. We're working with C# scripts as standard, but I was wondering if there was a way how to support all CLI languages. To do that I'd have to detect used CLI language that was used in this particular source code.
Is there some elegant way how to detect only CLI languages from the source code?
Thanks
There are only 3 languages for which the framework provides a CodeDomProvider: C#, JScript and VB. Therefore, if the framework provides any direct method to parse "any language", it can only support these 3 languages. I don't think it does.
You may want to try to parse your code with all three implementations of CodeDomProvider, and keep the first that succeeds. It will take a dozen of lines of code.
I think it might be your best try.
Documentation for CodeDomProvider: http://msdn.microsoft.com/en-us/library/ds075xdx.aspx
Documentation for CodeDomProvider.Parse: http://msdn.microsoft.com/en-us/library/system.codedom.compiler.codedomprovider.parse%28v=vs.110%29.aspx
Not sure if the title explains it correctly.
Anyways, I'm building a .NET WPF application which should go through the JavaScript and identify issues such as
If the variables defined are being nullified at the end
If try/catch/finally blocks are being used.
Function calls
I went through the questions over here which were all revolving around c/c++. Now I regret bunking my compilers classes.
I wanted to know how to verify points 1-3 in C#. Any library out there which does this?
What you're looking for is an abstract syntax tree parser for Javascript written in C#.
There are a few choices I know of:
Microsoft's Ajax Minifier library comes with its own AST parser (used to minify / optimize Javascript files). You can find the source code for that on GitHub.
Esprima.net is another option. It's a port of the popular Javascript library Esprima.
The good thing about Esprima is it outputs the AST in a common format (defined by Mozilla here) that's used across a few parsers, making it really easy to port utilities for walking the tree, etc. since they all use the same underlying data structure.
Check out IronJS I know they have a pretty good JavaScript library for .Net
IronJS
I'd like to know good strategies for deploying a domain-specific-language which must run under at least 2 languages (Java, C#) and probably more (Python, and possibly Javascript).
Some background. We have developed and deployed a domain-specific language currently written in C#. It's deployed though a series of method calls whose arguments are either common language primitives (string, double, etc.), Collections (IEnumerable, HashSet, ...) or objects in a domain-specific library (CMLMolecule, Point3, RealSquareMatrix). The library is well tested and the objects have to comply to a stable deployed XML schema so change will be evolutionary and managed (at least that's the hope).
We hope the language will become used by a wide and partially computer-literate community, used to hacking their own solutions without central control. Ideally the DSL will create a degree of encapsulation and produce the essential functionality they need. The libraries will manage the detailed algorithms which are many and varied but fairly well known. There's a lot in common with the requirements of the DSL in Domain-specific languages vs. library of functions.
I'd appreciate ideas on the best architecture (clearly once it's deployed we cannot easily backtrack). The choices include at least:
Creation of an IDL (e.g. through CORBA). The W3C did this for the XML DOM - I hated it - and it seems to be overkill
manual creation of similar signatures for each platform and best endeavour to keep them in sync.
Creation of a parsable language (e.g. CSS).
declarative programming in XML (c.f. XSLT). This is my preferred solution as it can be searched, manipulated, etc.
Performance is not important. Clarity of purpose is.
EDIT There was discussion as to whether application calls contitute a DSL. I have discovered Martin Fowler's introduction to DSLs (http://martinfowler.com/dslwip/Intro.html) where he argues that simple method calls (or chained calls) can be called a DSL. So a series like:
point0 = line0.intersectWith(plane);
point1 = line1.intersectWith(plane);
midpoint = point0.midpoint(point1);
could be considered a DSL
There seems to be some ambiguity in the question between language and library. The terms "internal DSL" and "external DSL" are useful, and I think are due to Martin Fowler.
An "external" DSL might be a standalone command-line tool. It is passed a string of source, it parses it somehow, and does something with it. There are no real limits on how the syntax and semantics can work. It can also be made available as a library consisting mostly of an eval-like method; a common example would be building a SQL query as a string and calling an execute method in an RDBMS library; not a very pleasant or convenient usage pattern, and horrible if spread around a program on a large scale.
An "internal" DSL is a library that is written in such a way as to take advantage of the quirks of a host (general purpose) language to create the impression that a new language can be embedded inside an existing one. In syntactically-rich languages (C++, C#) this means using operator overloading in ways that seriously stretch (or ignore) the usual meanings of the operator symbols. There are many examples in C++; a few in C# also - the Irony parser toolkit simulates BNF in a fairly restrained way which works well.
Finally, there is a plain old library: classes, methods, properties, with well-chosen names.
An external DSL would allow you to completely ignore cross-language integration problems, as the only library-like portion would be an eval method. But inventing your own tool chain is non-trivial. People always forget the huge importance of debugging, intellisense, syntax highlighting etc.
An internal DSL is probably a pointless endeavour if you want to do it well on C# and Java. The problem is that if you take advantage of the quirks of one host language, you won't necessarily be able to repeat the trick on another language. e.g. Java has no operator overloading.
Which leaves a plain old library. If you want to span C# and Java (at least), then you are somewhat stuck in terms of a choice of implementation language. Do you really want to write the library twice? One possibility is to write the library in Java, and then use IKVM to cross-compile it to .NET assemblies. This would guarantee you an identical interface on both of those platforms.
On the downside, the API would be expressed in lowest-common-denominator features - which is to say, Java features :). No properties, just getX/setX methods. Steer clear of generics because the two systems are quite different in that respect. Also even the standard way of naming methods differs between the two (camelCase versus PascalCase), so one set of users would smell a rat.
If you are willing to re-describe your language using ANTLR you could generate your DSL interpreter in multiple languages without having to manually maintain them including all of the languages you mentioned plus more.
Antlr is a parser/lexer generator and has a large number of target languages. This allows you to describe your language once, without having to maintain multiple copies of it.
See the whole list of target languages here.
Although I do not want to promote my own project too much, I would like to mention PIL, a Platform Independent Language, an intermediate language I have been working on to enable the support of multiple software platforms (like Java, Python, ...), specifically for external DSLs. The general idea is that you generate code in PIL (a subset of Java), which the PIL compiler can then translate to one of many other languages, currently just Java or Python, but more will be added in the future.
I presented a paper about this on the Software and Language Engineering conference about 2 days ago, you can find a link to the publication of the PIL website (pil-lang.org), if you're interested.
Ability to escape to the implementation language in the event you need to do something that just isn't supported by your DSL, or for performance reasons (though I realize that isn't a priority).
I am researching DSL for implementing rules in a rule engine in C#, some of the rules are really complex and may change significantly in the future, so being able to escape out to C# is really useful. Of course this breaks cross-platform compatibility, but it is really just a way of hacking around edge cases without having to change your DSL.
You'd be best off writing the library in C (or some language like rpython which will generate C-code) and then using SWIG or similar to generate the language specific bindings for C#, Java Python etc.
Note that this approach won't help if you are using Javascript in the browser - you'll have to write the javascript library separately. If you are using javascript through Rhino, then you'd be able to just use the Java bindings.
It is possible to interpret JavaScript from inside a Java-program directly using the script engine, and apparently also from C#. Python can be run on the JVM and the .NET engine.
I would suggest that you investigate these options, and then write your library in a common subset of the execution paths available to the language you choose. I would not consider writing it in a language which requires post translation and conversion, since you introduce a step which can be very, very difficult to debug in case of problems.
I would like to expand on Darien's answer. I think that ANTLR brings something to the table that few other lexer/parser tools provide (at least to my knowledge). If you would like to create a DSL which ultimately generates Java and C# code, ANTLR really shines.
ANTLR provides four fundamental components:
Lexer Grammar (break down input streams into tokens)
Parser Grammar (organize tokens into an abstract syntax tree)
Tree Grammar (walk the abstract syntax tree and pipe the metadata into a template engine)
StringTemplate (a template engine based on functional programming principles)
Your lexer,parser, and tree grammars can remain independent of your final generated language. In fact, the StringTemplate engine supports logical groups of template definitions. It even provides for interface inheritance of template groups. This means you can have third parties use your ANTLR parser to create say python, assembly, c, or ruby, when all you initially provided was java and C# output. The output language of your DSL can easily be extended as requirements change over time.
To get the most out of ANTLR you will want to read the following:
The Definitive ANTLR Reference: Building Domain-Specific Languages
Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages
Is there any functionality built into the .NET framework somewhere to tokenize C# code? I'm not looking to build a tokenizer in C#, I'm looking for something that can tokenize C# source code.
The only thing that comes to mind is a parser generator like ANTLR, which has C# Sample Grammar available. Bison/Flex also looks like it has pretty decent C# grammar as well. Parsing any language and then actually making sense of it is fairly difficult, so I wish you the best of luck.
No, not built into the framework.
However, you may want to look at Irony, and C# Parser on CodePlex, as they both provide a parser/lexer for at least simple C#
The GOLD Parser too has a C# grammar (to parse C#), and run-time engines written in C# (so that you can execute that grammar using C# code).
I'm looking for turn-key ANTLR grammar for C# that generates a usable Abstract Syntax Tree (AST) and is either back-end language agnostic or targets C#, C, C++ or D.
It doesn't need to support error reporting.
P.S. I'm not willing to do hardly any fix-up as the alternative is not very hard.
This may be waaaay too late, but you can get a C# 4 grammar.
Here's a C# grammar link, as well as an overview of C# and ANTLR. There are others for the other languages you mentioned here.
The DMS Software Reengineering Toolkit provides a full, validated grammar for C# 1.2, 2.0 and 3.0 with generics and LINQ expressions.
It automatically builds ASTs, allows you programmatic access to the ASTs for analysis or tranformation, or you can apply source-to-source transformations that also directly manipulate the tree. The resulting AST can be prettyprinted back to source code, even retaining indentation and comments.
DMS also has mature front ends for other languages such as Java, PHP5, JavaScript, COBOL, C and C++.
EDIT: 1/31/2010: The DMS C# parser has been extended to handle full C# 4.0.
You can find C# 6 ANTLR grammar at official grammars repository.