I am making an interpreter in Antlr and I am having trouble with the following:
One of my rules states:
expression
: { .. }
| {... }
| identifier
| expression DOT expressioon
| get_index
| invoke_method
| { ... }
;
I thought this way I'd evaluate expressions such as "a.b[2].Run()" or whatever... the idea is being able to use both variables, indexing, and methods. All 3 syntaxes are in place... How do I chain them tho? Is my approach correct? or is there a better way? (The example I provided just randomly throws those in with no specification for the sake of clearance, it's not the actual grammar but I can assure you all other 3 grammar rules(identifier, get_index and invoke_method) are defined and are proper, identifier is a single one, not chained)
Related
I have something that goes alongside:
method_declaration : protection? expression identifier LEFT_PARENTHESES (method_argument (COMMA method_argument)*)? RIGHT_PARENTHESES method_block;
expression
: ...
| ...
| identifier
| kind
;
identifier : IDENTIFIER ;
kind : ... | ... | VOID_KIND; // void for example there are more
IDENTIFIER : (LETTER | '_') (LETTER | DIGIT | '_')*;
VOID_KIND : 'void';
fragment LETTER : [a-zA-Z];
fragment DIGIT : [0-9];
*The other rules on the method_declaration are not relavent for this question
What happens is that when I input something such as void Start() { }
and look at the ParseTree, it seems to think void is an identifier and not a kind, and treats it as such.
I tried changing the order in which kind and identifier are written in the .g4 file... but it doesn't quite seem to make any difference... why does this happen and how can I fix it?
The order in which parser rules are defined makes no difference in ANTLR. The order in which token rules are defined does though. Specifically, when multiple token rules match on the current input and produce a token of the same length, ANTLR will apply the one that is defined first in the grammar.
So in your case, you'll want to move VOID_KIND (and any other keyword rules you may have) before IDENTIFIER. So pretty much what you already tried except with the lexer rules instead of the parser rules.
PS: I'm somewhat surprised that ANTLR doesn't produce a warning about VOID_KIND being unmatchable in this case. I'm pretty sure other lexer generators would produce such a warning in cases like this.
I have a table in which I save an ID and a rule like:
| ID | Rule |
|------|--------------------------------------|
| 1 | firstname[0]+'.'+lastname+'#'+domain |
| 2 | firstname+'_'+lastname+'#'+domain |
| 3 | lastname[0]+firstname+'#'+domain |
My problem is: How can I get and analyze/execute that rule in my code? Because the cell is taken as a string and I don't know how to apply that rule to my variables or my code.
I was thinking about String.Format, but I don't know how to split a string taking just the first character with it.
If you could give me an advice or any better way to do this, I'd appreciate that because I'm completely lost.
If that is C#, you could construct a LINQ Expression out of the parse tree from for example ANTLR, or if the format is very simple, regex.
You have to make these steps:
Evaluate the incoming string using ANTLR. You could start off with the C# grammar;
Build an expression from it;
Run the expression giving the firstname, domain, etc. parameters.
Not sure that would do the trick, but you might want to look at CSharpCodeProvier. Never used it, but according to the examples, it seems to be capable of compiling code entered in a textbox.
The thing is that this solution generates an exe file that will be stored in your project folder. Even if you delete them after a successful compiling, that might not be the best option.
I have a string, which contains a custom expression, I have to parse and evaluate:
For example:
(FUNCTION_A(5,4,5) UNION FUNCTION_B(3,3))
INTERSECT (FUNCTION_C(5,4,5) UNION FUNCTION_D(3,3))
FUNCTION_X represent functions, which are implemented in C# and return ILists.
UNION or INTERSECT are custom functions which should be applied to the lists, which are returned from those functions.
Union and intersect are implemented via Enumerable.Intersect/Enumerable.Union.
How can the parsing and evaluating be implemented in an elegant and expandable manner?
It depends on how complex your expressions will become, how many different operators are going to be available, and a whole number of different variables. Whichever way you do it, you will probably need to first determine a grammar for your mini-language.
For simple grammars, you can just write a custom parser. In the case of many calculators and similar applications, a recursive descent parser is expressive enough to handle the grammar and is intuitive to write. The linked Wikipedia page gives a sample grammar and the implementation of a C parser for it. Eric White also has a blog post on building recursive descent parsers in C#.
For more complex grammars, you will likely want to skip the work of creating this yourself and use a lex/yacc-type lexer and parser toolset. Normally you give as input to these a grammar in EBNF or similar syntax, and they will produce the code necessary to parse the input for you. The parser will typically return a syntax tree which you can traverse, allowing you to apply logic for each token in the input stream (each node in the tree). For C#, I have worked with GPLex and GPPG, but others such as ANTLR are also available.
Basic Parsing Concepts
In general, you want to be able to split each item in the input into a meaningful token, and build a tree based on those tokens. Once the tree is built, you can traverse the tree and perform the necessary action at each node. A syntax tree for FUNCTION_A(5,4,5) UNION FUNCTION_B(3,3) might look like this, where the node types are in capital letters and their values are in parenthesis:
PROGRAM
|
|
UNION
|
------------------------------
| |
FUNCTION (FUNCTION_A) FUNCTION(FUNCTION_B)
| |
------------- ----------
| | | | |
INT(5) INT(4) INT(5) INT(3) INT(3)
The parser needs to be smart enough to know that when a UNION is found, it needs to be supplied with two items to union, etc. Given this tree, you would start at the root (PROGRAM) and do a depth-first traversal. At the UNION node, the action would be to first visit all children, and then union the results together. At a FUNCTION node, the action would be to first visit all of the children, find their values, and use those values as parameters to the function, and secondly to evaluate the function on those inputs and return the value.
This would continue for all tokens, for any expression you can come up with. In this way, if you spend the time to get the parser to produce the right tree and each node knows how to perform whatever action it needs to, your design is very extensible and can handle any input that matches the grammar it was designed for.
I faced with the 'quotation' term and I'm trying to figure out some real-life examples of usage of it. Ability of having AST for each code expression sounds awesome, but how to use it in real life?
Does anyone know such example?
F# and Nemerle quotations are both used for metaprogramming, but the approaches are different: Nemerle uses metaprogramming at compilation time to extend the language, while F# uses them at run time.
Nemerle
In Nemerle, quotations are used within macros for taking apart pieces of code and generating new ones. Much of the language itself is implemented this way. For example, here is an example from the official library — the macro implementing the when conditional construct. Nemerle does not have statements, so an if has to have an else part: when and unless macros provide shorthand for an if with an empty then and else parts, respectively. The when macro also has extended pattern-matching functionality.
macro whenmacro (cond, body)
syntax ("when", "(", cond, ")", body)
{
match (cond)
{
| <[ $subCond is $pattern ]> with guard = null
| <[ $subCond is $pattern when $guard ]> =>
match (pattern)
{
| PT.PExpr.Call when guard != null =>
// generate expression to replace 'when (expr is call when guard) body'
<[ match ($subCond) { | $pattern when $guard => $body : void | _ => () } ]>
| PT.PExpr.Call =>
// generate expression to replace 'when (expr is call) body'
<[ match ($subCond) { | $pattern => $body : void | _ => () } ]>
| _ =>
// generate expression to replace 'when (expr is pattern) body'
<[ match ($cond) { | true => $body : void | _ => () } ]>
}
| _ =>
// generate expression to replace 'when (cond) body'
<[ match ($cond : bool) { | true => $body : void | _ => () } ]>
}
}
The code uses quotation to handle patterns that look like some predefined templates and replace them with corresponding match expressions. For example, matching the cond expression given to the macro with:
<[ $subCond is $pattern when $guard ]>
checks whether it follows the x is y when z pattern and gives us the expressions composing it. If the match succeeds, we can generate a new expression from the parts we got using:
<[
match ($subCond)
{
| $pattern when $guard => $body : void
| _ => ()
}
]>
This converts when (x is y when z) body to a basic pattern-matching expression. All of this is automatically type-safe and produces reasonable compilation errors when used incorrectly. So, as you see quotation provides a very convenient and type-safe way of manipulating code.
Well, anytime you want to manipulate code programmatically, or do some metaprogramming, quotations make it more declarative, which is a good thing.
I've written two posts about how this makes life easier in Nemerle: here and here.
For real life examples, it's interesting to note that Nemerle itself defines many common statements as macros (where quotations are used). Some examples include: if, for, foreach, while, break, continue and using.
I think quotations have quite different uses in F# and Nemerle. In F#, you don't use quotations to extend the F# language itself, but you use them to take an AST (data representation of code) of some program written in standard F#.
In F#, this is done either by wrapping a piece of code in <# ..F# code.. #>, or by adding a special attribtue to a function:
[<ReflectedDefinition>]
let foo () =
// body of a function (standard F# code)
Robert already mentioned some uses of this mechanism - you can take the code and translate F# to SQL to query database, but there are several other uses. You can for example:
translate F# code to run on GPU
translate F# code to JavaScript using WebSharper
As Jordão has mentioned already quotations enable meta programming. One real world example of this is the ability to use quotations to translated F# into another language, like for example SQL. In this way Quotations server much the same purpose as expression trees do in C#: they enable linq queries to be translated into SQL (or other data-acess language) and executed against a data store.
Unquote is a real-life example of quotation usage.
I'm looking to write a Truth Table Generator as a personal project.
There are several web-based online ones here and here.
(Example screenshot of an existing Truth Table Generator)
I have the following questions:
How should I go about parsing expressions like: ((P => Q) & (Q => R)) => (P => R)
Should I use a parser generator like ANTLr or YACC, or use straight regular expressions?
Once I have the expression parsed, how should I go about generating the truth table? Each section of the expression needs to be divided up into its smallest components and re-built from the left side of the table to the right. How would I evaluate something like that?
Can anyone provide me with tips concerning the parsing of these arbitrary expressions and eventually evaluating the parsed expression?
This sounds like a great personal project. You'll learn a lot about how the basic parts of a compiler work. I would skip trying to use a parser generator; if this is for your own edification, you'll learn more by doing it all from scratch.
The way such systems work is a formalization of how we understand natural languages. If I give you a sentence: "The dog, Rover, ate his food.", the first thing you do is break it up into words and punctuation. "The", "SPACE", "dog", "COMMA", "SPACE", "Rover", ... That's "tokenizing" or "lexing".
The next thing you do is analyze the token stream to see if the sentence is grammatical. The grammar of English is extremely complicated, but this sentence is pretty straightforward. SUBJECT-APPOSITIVE-VERB-OBJECT. This is "parsing".
Once you know that the sentence is grammatical, you can then analyze the sentence to actually get meaning out of it. For instance, you can see that there are three parts of this sentence -- the subject, the appositive, and the "his" in the object -- that all refer to the same entity, namely, the dog. You can figure out that the dog is the thing doing the eating, and the food is the thing being eaten. This is the semantic analysis phase.
Compilers then have a fourth phase that humans do not, which is they generate code that represents the actions described in the language.
So, do all that. Start by defining what the tokens of your language are, define a base class Token and a bunch of derived classes for each. (IdentifierToken, OrToken, AndToken, ImpliesToken, RightParenToken...). Then write a method that takes a string and returns an IEnumerable'. That's your lexer.
Second, figure out what the grammar of your language is, and write a recursive descent parser that breaks up an IEnumerable into an abstract syntax tree that represents grammatical entities in your language.
Then write an analyzer that looks at that tree and figures stuff out, like "how many distinct free variables do I have?"
Then write a code generator that spits out the code necessary to evaluate the truth tables. Spitting IL seems like overkill, but if you wanted to be really buff, you could. It might be easier to let the expression tree library do that for you; you can transform your parse tree into an expression tree, and then turn the expression tree into a delegate, and evaluate the delegate.
Good luck!
I think a parser generator is an overkill. You could use the idea of converting an expression to postfix and evaluating postfix expressions (or directly building an expression tree out of the infix expression and using that to generate the truth table) to solve this problem.
As Mehrdad mentions you should be able to hand roll the parsing in the same time as it would take to learn the syntax of a lexer/parser. The end result you want is some Abstract Syntax Tree (AST) of the expression you have been given.
You then need to build some input generator that creates the input combinations for the symbols defined in the expression.
Then iterate across the input set, generating the results for each input combo, given the rules (AST) you parsed in the first step.
How I would do it:
I could imagine using lambda functions to express the AST/rules as you parse the tree, and building a symbol table as you parse, you then could build the input set, parsing the symbol table to the lambda expression tree, to calculate the results.
If your goal is processing boolean expressions, a parser generator and all the machinery that go with is a waste of time, unless you want to learn how they work (then any of them would be fine).
But it is easy to build a recursive-descent parser by hand for boolean expressions, that computes and returns the results of "evaluating" the expression. Such a parser could be used on a first pass to determine the number of unique variables, where "evaluation" means "couunt 1 for each new variable name".
Writing a generator to produce all possible truth values for N variables is trivial; for each set of values, simply call the parser again and use it to evaluate the expression, where evaluate means "combine the values of the subexpressions according to the operator".
You need a grammar:
formula = disjunction ;
disjunction = conjunction
| disjunction "or" conjunction ;
conjunction = term
| conjunction "and" term ;
term = variable
| "not" term
| "(" formula ")" ;
Yours can be more complicated, but for boolean expressions it can't be that much more complicated.
For each grammar rule, write 1 subroutine that uses a global "scan" index into the string being parsed:
int disjunction()
// returns "-1"==> "not a disjunction"
// in mode 1:
// returns "0" if disjunction is false
// return "1" if disjunction is true
{ skipblanks(); // advance scan past blanks (duh)
temp1=conjunction();
if (temp1==-1) return -1; // syntax error
while (true)
{ skipblanks();
if (matchinput("or")==false) return temp1;
temp2= conjunction();
if (temp2==-1) return temp1;
temp1=temp1 or temp2;
}
end
int term()
{ skipblanks();
if (inputmatchesvariablename())
{ variablename = getvariablenamefrominput();
if unique(variablename) then += numberofvariables;
return lookupvariablename(variablename); // get truthtable value for name
}
...
}
Each of your parse routines will be about this complicated. Seriously.
You can get source code of pyttgen program at http://code.google.com/p/pyttgen/source/browse/#hg/src It generates truth tables for logical expressions. Code based on ply library, so its very simple :)