Parser for the Mathematica syntax?

Parser for the Mathematica syntax? - c#

Is there a built parser that I can use from C# that can parse mathematica expressions?
I know that I can use the Kernel itself to parse an expression, and use .NET/Link to retrieve the tree structure... But I'm looking for something that doesnt rely on the Kernel.

My matheclipse-parser module implements a parser in Java which can parse a big subset of mathematica expressions. See the readme.md page for usage. Maybe you can port the parser to C#?

The mathematica grammar isn't well documented, true. But AFAIK, it is
LALR(1) and likely LL(1); the bracketed /tagged syntax from gives the parser complete clues
about what to expect next, just like LISP and XML.
The DMS Software Reengineering Toolkit does have a Mathematica grammar that has been used for real tasks.
This includes MMa programs as well as pure expression forms.
That probably doesn't help you, since you want one in C#.
If you have access to the Kernal, I'd stick to that.

I don't think such a thing exists already (I'd love to know about it). But it may be useful that within Mathematica you can apply the function FullForm to any expression and get something very easy to parse, kind of like an s-expression in Lisp. For example,
FullForm[a+b*c]
yields
Plus[a, Times[b,c]]
That's the underlying representation of all Mathematica expressions and should be straightforward to parse.

Related

Creating mathematical function in runtime from string

I am asking this question, because I didn't find yet any posts that are C# related and there might be some build in methods for that I couldn't find. If there are, please tell me so and I can close this question.
Basically I have the common situation:
User types a function w.r.t. one or two variables into some TextBlock
I take this string analyse it
As a return I would like to have a delegate to a method that will take one or two inputs (the variables) and return the function value according to what the user typed in.
Now, I could probably think (and I would like to do this on my own, because I want to use my brain) of an algorithm of analysing the string step by step to actually find out, what has to be calculated first and in what way. E.g. First scan for parentheses, look for the expression within a group of parantheses and calculate that according to more general functions etc.
But in the end I would like to "create" a method of this analysis to be easily used as a normal delegate with a couple of arguments that will return the correct function value.
Are there any methods included in C# for that already, or would I have to go and program everything by myself?
As a remark: I don't want to use anybody else' library, only .NET libraries are acceptable for me.
Edit: After Matt pointed out expression trees, I found this thread which is a good example to my problem.
Edit2: The example pointed out does only include simple functions and will not be useful if I want to include more complex functions such as trigonometric ones or exponentials.

What you are describing is a parser. There are a number of different ways of implementing them, although generally speaking, for complex grammars, a "parser generator" is often used.
A parser generator will take a description of the grammar and convert it into code that will parse text that conforms to the grammar into some form of internal representation that can be manipulated by the program, e.g. a parse tree.
Since you indicate you want to avoid third-party libraries, I'll assume that the use of a parser generator is similarly excluded, which leaves you with implementing your own parser (which fortunately is quite an interesting exercise).
The Wikipedia page on Recursive descent parsers will be particularly useful. I suggest reading through it and perhaps adapting the example code therein to your particular use case. I have done this myself a number of times for different grammars with this as a starting point, so can attest to its usefulness.
The output from such a parser will be a "parse tree". And you then have a number of possibilities for how you convert this into an executable delegate. One option is to implement an Evaluate() method on your parse tree nodes, which will take a set of variables and return the result of evaluating the user's expression. As others have mentioned, your parse tree could leverage .NET's Expression trees, or you can go down the route of emitting IL directly (permitting you to produce a compiled .NET assembly from the user's expression for later use as required).

You might want to look at expression trees.

Check out NCalc for some examples of how to do this. You don't need to use the library, but reading the source is pretty educational.

I found a very helpful pdf explaining the parsing in C# 2.0. This link leads to a very good tutorial on parsers used in C# and also applies that later on to an arithmetic expression.
As this directly helps and answers to my question, I posted this as an answer, rather than as a comment or edit.

How to work with textual formats in otherwise procedural code?

This question sounds trivial but let me explain my scenario.
I am working in an object oriented programming language (C#) and most of the actual execution code is procedural, i.e. series of statements, sometimes branches and loops. Fairly standard.
Now I am presented with a task to deal with a textual format (PGN, but it could be anything other like VCard or some custom format). At least for me, the "standard" way to work with it would be to use a mix of:
regular expressions
if / switch statements
for-loops
storing regexp matches into some custom structure and / or outputting it to some result format
However, I don't like this procedural approach at all - regular expressions are prone to errors, the code is usually quite hard to understand and debug, it usually tends to have quite a high cyclomatic complexity etc.
Simply put, I'd like it to be declarative but I don't know what tools or libraries to use.
I remember that when I saw demos of the "M" language I thought that that was exactly I was looking for. There was a simple way to declare syntax of my textual format, the tool would then automatically parse input string into an in-memory representation of the textual DSL, I think that it was also possible to transform the format into another etc.
I have been also in touch with the people behind JetBrains MPS which is another tool for working with DSLs but my scenario doesn't seem to be a perfect match for what they are trying to provide.
So if anyone has any idea about how to elegantly deal with textual formats in otherwise procedural code base, I'd be happy to learn about the options.

Check out my open source project meta#. I think it sounds like exactly what you're looking for.

Code parsing C#

I am researching ways, tools and techniques to parse code files in order to support syntax highlighting and intellisence in an editor written in c#.
Does anyone have any ideas/patterns & practices/tools/techiques for that.
EDIT: A nice source of info for anyone interested:
Parsing beyond Context-free grammars
ISBN 978-3-642-14845-3

My favourite parser for C# is Irony: http://irony.codeplex.com/ - i have used it a couple of times with great success
Here is a wikipedia page listing many more: http://en.wikipedia.org/wiki/Compiler-compiler

There are two basic aproaches:
1) Parse the entire solution and everything it references so you understand all the types involved in the code
2) Parse locally and do your best to guess what types etc are.
The trouble with (2) is that you have to guess, and in some circumstances you just can't tell from a code snippet exactly what everything is. But if you're happy with the sort oif syntax highlighting shown on (e.g.) Stack Overflow, then this approach is easy and quite effective.
To do (1) then you need to do one of (in decreasing order of difficulty):
Parse all the source code. Not possible if you reference 3rd party assemblies.
Use reflection on the compiled code to garner type information you can use when parsing the source.
Use the host IDE's (if avaiable - so not applicable in your case!) code element interfaces to provide the information you need

You could take a look at how http://www.icsharpcode.net/ did it. They wrote a book doing just that, Dissecting a C# Application: Inside SharpDevelop, it even has a chapter called
Implement a parser to provide syntax
highlighting and auto-completion as
users type

Creating a scripting language to be used to create web pages

I am creating a scripting language to be used to create web pages, but don't know exactly where to begin.
I have a file that looks like this:
mylanguagename(main) {
OnLoad(protected) {
Display(img, text, link);
}
Canvas(public) {
Image img: "Images\my_image.png";
img.Name: "img";
img.Border: "None";
img.BackgroundColor: "Transparent";
img.Position: 10, 10;
Text text: "This is a multiline str#ning. The #n creates a new line.";
text.Name: text;
text.Position: 10, 25;
Link link: "Click here to enlarge img.";
link.Name: "link";
link.Position: 10, 60;
link.Event: link.Clicked;
}
link.Clicked(sender, link, protected) {
Image img: from Canvas.FindElement(img);
img.Size: 300, 300;
}
}
... and I need to be able to make that text above target the Windows Scripting Host. I know this can be done, because there used to be a lot of Docs on it around the net a while back, but I cannot seem to find them now.
Can somebody please help, or get me started in the right direction?
Thanks

You're making a domain-specific language which does not exist. You want to translate to another language. You will need a proper scanner and parser. You've probably been told to look at antlr. yacc/bison, or gold. What went wrong with that?
And as an FYI, it's a fun exercise to make new languages, but before you do for something like this, you might ask a good solid "why? What does my new language provide that I couldn't get any other (reasonable) way?"

The thing to understand about parsing and language creation is that writing a compiler/interpreter is primarily about a set of data transformations done to an input text.
Generally, from an input text you will first translate it into a series of tokens, each token representing a concept in your language or a literal value.
From the token stream, you will generally then create an intermediate structure, typically some kind of tree structure describing the code that was written.
This tree structure can then be validated or modified for various reasons, including optimization.
Once that's done, you'll typically write the tree out to some other form - assembly instructions or even a program in another language - in fact, the earliest versions of C++ wrote out straight C code, which were then compiled by a regular C compiler that had no knowledge of C++ at all. So while skipping the assembly generation step might seem like cheating, it has a long and proud tradition behind it :)
I deliberately haven't gotten into any suggestions for specific libraries, as understanding the overall process is probably much more important than choosing a specific parser technology, for instance. Whether you use lex/yacc or ANTLR or something else is pretty unimportant in the long run. They'll all (basically) work, and have all been used successfully in various projects.
Even doing your own parsing by hand isn't a bad idea, as it will help you to learn the patterns of how parsing is done, and so then using a parser generator will tend to make more sense rather than being a black box of voodoo.

Languages similar to C# are not easy to parse - there are some naturally left-recursive rules. So you have to use a parser generator that can deal with them properly. ANTLR fits well.
If PEG fits better, try this: http://www.meta-alternative.net/mbase.html

So you want to translate C# programs to JavaScript? Script# can do this for you.

Rather than write your own language and then run a translator to convert it into Javascript, why not extend Javascript to do what you want it to do?
Take a look at jQuery - it extends Javascript in many powerful ways with a very natural and fluent syntax. It's almost as good as having your own language. Take a look at the many extensions people have created for it too, especially jQuery UI.

Assuming you are really dedicated to do this, here is the way to go. This is normally what you should do: source -> SCANNER -> tokens -> PARSER -> syntax tree
1) Create a scanner/ parser to parse your language. You need to write a grammar to generate a parser that can scan/parse your syntax, to tokenize/validate them.
I think the easiest way here is to go with Irony, that'll make creating a parser quick and easy. Here is a good starting point
http://www.codeproject.com/KB/recipes/Irony.aspx
2) Build a syntax tree - In this case, I suggest you to build a simple XML representation instead of an actual syntax tree, so that you can later walk the XML representation of your DOM to spit out VB/Java Script. If your requirements are complex (like you want to compile it or so), you can create a DLR Expression Tree or use the Code DOM - but here I guess we are talking about a translator, and not about a compiler.
But hey wait - if it is not for educational purposes, consider representing your 'script' as an xml right from the beginning, so that you can avoid a scanner/parser in between, before spitting out some VB/Java script/Html out of that.

I don't wan to be rude... but why are you doing this?
Creating a parser for a regular language is a non-trivial task. Just don't do it.
Why don't you just use html, javascript and css (and jquery as someone above suggested)
If you don't know where to begin, then you probably don't have any experience of this kind and probably you don't have a good reason, why to do this.
I want to save you the pain. Forget it. It's probably a BAD IDEA!
M.

Check out Constructing Language Processors for Little Languages. It's a very good intro I believe. In fact I just consulted my copy 2 days ago when I was having trouble with my template language parser.
Use XML if at all possible. You don't want to fiddle with a lexer and parser by hand if you want this thing in production. I've made this mistake a few times. You end up supporting code that you really shouldn't be. It seems that your language is mainly a templating language. XML would work great there. Just as ASPX files are XML. Your server side blocks can be written in Javascript, modified if necessary. If this is a learning exercise then do it all by hand, by all means.
I think writing your own language is a great exercise. So is taking a college level compiler writing class. Good luck.

You obviously need machinery designed to translate langauges: parsing, tree building, pattern matching, target-language tree building, target-language prettyprinting.
You can try to do all of this with YACC (or equivalents), but you'll discover that parsing
is only a small part of a full translator. This means there's a lot more work
to do than just parsing, and that takes time and effort.
Our DMS Software Reengineering Toolkit is a commercial solution to building full translators for relatively modest costs.
If you want to do it on your own from the ground up as an exercise, that's fine. Just be prepared for the effort it really takes.
One last remark: designing a complete language is hard if you want to get a nice result.

Personally I think that every self-imposed challenge is good. I do agree with the other opinions that if what you want is a real solution to a real life problem, it's probably better to stick with proved solutions. However, if as you said yourself, you have an academic interest into solving this problem, then I encourage you to keep on. If this is the case, I might point a couple of tips to get you on the track.
Parsing is not really an easy task, that is way we take at least a semester of it. However, it can be learned. I would recommend starting with Terrence Parr's book on language implementation patterns. There are many great books about compiling and parsing, probably the most loved and hated been the Dragon Book.
This is pretty heavy stuff, but if you are really into this, and have the time, you should definitely take a look. This would be the Robisson Crusoe's "i'll make it all by myself approach". I have recently written an LR parser generator and it took me no more than a long weekend, but that after reading a lot and taking a full two-semesters course on compilers.
If you don't have the time or simply don't want to learn to make a parser "like men do", then you can always try a commercial or academic parser generator. ANTLR is just fine, but you have to learn its meta-language. Personally I think that Irony is a great tool, specially because it stays inside C# and you can take a look at the source code and learn for yourself. Since we are here, and I'm not trying to make any advertisement at all, I have posted a tiny tool in CodePlex that could be useful for this task. Take a look for yourself, it's open-source and free.
As a final tip, don't get scared if someone tells you it cannot be done. Parsing is a difficult theoretical problem but it's nothing that can't be learned, and it really is a great tool to have in your portfolio. I think it speaks very good of a developer that he can write an descent-recursive parser by hand, even if he never has to. If you want to pursuit this goal to its end, take a college-level compilers course, you'll thank me in a year.

C# and regex for word substitutions with nested tags

I'm trying to create a small app that takes a base text template with specially tagged word arrays, parses the template contents and outputs a randomly generated text document.
Essentially, what I'm trying to do is take this:
<{Hello|Hi|Howdy}> world.
and turn it into this:
Hello world.
OR
Hi world.
OR
Howdy world.
So far, so good. Googling got me enough to be able to successfully extract the inner text between the <{ and }> into an array, from which I then randomly select a word to replace the full <{Hello|Hi|Howdy}>.
The problem I'm having is parsing a nested set of words wrapped in the same tags.
For example, if I start with this:
<{Hello|Hi|Howdy}> world. <{How's <{life|it going}>?|How are you?}>
I'd like to turn it into this:
Hello world. How's life?
OR
Hello world. How's it going?
OR
Hello world. How are you?
and so on...
Could someone suggest a way to do this fairly simply using c# and regex?
I've looked at http://www.vsj.co.uk/articles/display.asp?id=789 and http://www.m-8.dk/resources/RegEx-balancing-group.aspx, and to be honest, a lot of that goes way over my head, so something simple would be nice. ;-)
Thank you.

If you have currently have a regex that can correctly parse the values inside your tag into an array (call it A'), then for each value in A', reapply that regex.
You should be able to do this recursively.

There is lex and yacc in the Visual Studio SDK:
These links might help:
http://msdn.microsoft.com/en-us/library/bb165963(VS.80).aspx
http://devhawk.net/2006/09/17/Managed+Lex+And+Yacc.aspx
Depending on how complex your parsing is going to be (considering possible future changes and additions) however you may just want to stick with Regex.

This problem is not well suited to for regular expressions. The grammar needed to recognize the expression you described is not a regular grammar.
The expressions described above however can be described by a context-free grammar.
You should be able to parse this efficiently with a LL(1) parser. I would say that the problem is better suited to tokenizing the input using lex and constructing a abstract syntax tree using yacc.
Here's a tutorial on Grammars and parsing with C#

Seems like you're trying to describe and use a Context-Free Grammar rather than a regular expression.
Context-free grammars are strictly more powerful than regular expressions:
Any language that can be generated using regular expressions can be generated by a context-free grammar.
There are languages that can be generated by a context-free grammar that cannot be generated by any regular expression.
For C#, I recommend you ANTLR, is a framework for Language Recognition, allows you to construct recognizers, interpreters, compilers, and translators from grammatical descriptions.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.