i have written a grammar for while in ANTLRWorks 1.5.2.
i also added some actions so when i debug my code with a while code it will show 3 address code in output of ANTLRWorks.
my grammar is like that:
NAME:
LETTER (LETTER | DIGIT | '_')*;
NUMBER:
DIGIT+; // just integers
fragment DIGIT:
'0'..'9';
fragment LETTER:
'A'..'Z' | 'a'..'z';
RELATION:
'<' | '<=' | '==' | '>=' | '>' | '!=' ;
WHITESPACE:
(' ' | '\t' | '\r' | '\n')+ { $channel = HIDDEN; };
and i generate my grammar and i have whileParser.cs and whileLexer.cs in output folder.
now i want to add my grammar to a c# project.
i want to get input from user and then show output of my grammar to them.
and i dont know how to add .g file and output classes to a c# project.
i am using visual studio 2013.
can anybody help me?
You grammar contains Java code blocks, you need to translate them to C# first. Actually, it may be a good opportunity for you to use ANTLR 4 instead and/or to switch to a parse tree approach. I should mention there's an ANTLRWorks 2 version, mostly for ANTLR 4, should you need it.
Anyway, just install the ANTLR Visual Studio Plugin and let it handle that for you. It works with both ANTLR 3 and 4.
You'll then have to add the ANTLR runtime to your project. For this, you can install the ANTLR4 NuGet or the ANTLR3 version depending on what version you chose to use in the end.
Related
I have something that goes alongside:
method_declaration : protection? expression identifier LEFT_PARENTHESES (method_argument (COMMA method_argument)*)? RIGHT_PARENTHESES method_block;
expression
: ...
| ...
| identifier
| kind
;
identifier : IDENTIFIER ;
kind : ... | ... | VOID_KIND; // void for example there are more
IDENTIFIER : (LETTER | '_') (LETTER | DIGIT | '_')*;
VOID_KIND : 'void';
fragment LETTER : [a-zA-Z];
fragment DIGIT : [0-9];
*The other rules on the method_declaration are not relavent for this question
What happens is that when I input something such as void Start() { }
and look at the ParseTree, it seems to think void is an identifier and not a kind, and treats it as such.
I tried changing the order in which kind and identifier are written in the .g4 file... but it doesn't quite seem to make any difference... why does this happen and how can I fix it?
The order in which parser rules are defined makes no difference in ANTLR. The order in which token rules are defined does though. Specifically, when multiple token rules match on the current input and produce a token of the same length, ANTLR will apply the one that is defined first in the grammar.
So in your case, you'll want to move VOID_KIND (and any other keyword rules you may have) before IDENTIFIER. So pretty much what you already tried except with the lexer rules instead of the parser rules.
PS: I'm somewhat surprised that ANTLR doesn't produce a warning about VOID_KIND being unmatchable in this case. I'm pretty sure other lexer generators would produce such a warning in cases like this.
ANTLR build system:
Visual Studio 2017, C#
NuGet packages: Antlr4.CodeGenerator 4.6.5-rc002, Antlr4.Runtime 4.6.5-rc002
I've got the following Flex rule which I'd like to convert to ANTLR 4:
NOT_NAME [^[:alpha:]_*\n]+
I think that I've already found out that ANTLR doesn't support POSIX or Unicode character classes but that you can create fragments to include them into your lexer grammar.
In my attempt to translate the above rule I've already created the following fragments:
fragment ALPHA: L | Nl;
fragment L : Ll | Lm | Lo | Lt | Lu ;
fragment Ll : '\u0061'..'\u007A' ; /* rest omitted for brevity */
fragment Lm : '\u02B0'..'\u02C1' ; /* rest omitted for brevity */
fragment Lo : '\u00AA' | '\u00BA' ; /* rest omitted for brevity */
fragment Lt : '\u01C5' | '\u01C8' ; /* rest omitted for brevity */
fragment Lu : '\u0041'..'\u005A' ; /* rest omitted for brevity */
fragment Nl : '\u16EE'..'\u16F0' ; /* rest omitted for brevity */
The ANTLR rule I had thought would work was the following:
NOT_NAME: ~(ALPHA | '_' | '*' | '\n')+;
but it gives me the following error:
rule reference 'ALPHA' is not currently supported in a set
The problem seems to be the negation as rules without negation seem to work without problems.
I know that it works if I inline all the above fragments into one rule but this appears insanely complicated to me - especially given the pretty simple and straightforward Flex rule.
I must be missing some elegant trick that you will possibly point me to.
The Unicode characterset support doesn't depend on the target runtime. The ANTLR4 tool itself converts the grammars and also parses the charset definitions. You should be able to use any of the Unicode classes as laid out in the lexer documentation. I'm not sure however if you can negate that block with the tilde. At least there is the option to use \P... to negate a char class (also mention in that document).
I have a table in which I save an ID and a rule like:
| ID | Rule |
|------|--------------------------------------|
| 1 | firstname[0]+'.'+lastname+'#'+domain |
| 2 | firstname+'_'+lastname+'#'+domain |
| 3 | lastname[0]+firstname+'#'+domain |
My problem is: How can I get and analyze/execute that rule in my code? Because the cell is taken as a string and I don't know how to apply that rule to my variables or my code.
I was thinking about String.Format, but I don't know how to split a string taking just the first character with it.
If you could give me an advice or any better way to do this, I'd appreciate that because I'm completely lost.
If that is C#, you could construct a LINQ Expression out of the parse tree from for example ANTLR, or if the format is very simple, regex.
You have to make these steps:
Evaluate the incoming string using ANTLR. You could start off with the C# grammar;
Build an expression from it;
Run the expression giving the firstname, domain, etc. parameters.
Not sure that would do the trick, but you might want to look at CSharpCodeProvier. Never used it, but according to the examples, it seems to be capable of compiling code entered in a textbox.
The thing is that this solution generates an exe file that will be stored in your project folder. Even if you delete them after a successful compiling, that might not be the best option.
I'm doing an basic CSV import/export in C#. Most of it is really simple and basic, we just have one speciality.
In values we import/export, we have some specials values, which are not ASCII values. To ease the work of our end users, the customer decided to convert some values in some other values and do the opposite when importing.
Some examples
Value in our application | ValueS that must be accepted on parsing
-----------------------------------------------------------------------
³ | 3, ^3, **3
μ | u
₃ | 3
⁹ | 9
° | deg
φ | phi
To export, it's easy, we replace the matching character by the first on the second column.
But for the parsing, it's more complicated, and I don't see an easy way to get all the possible values to import?
One example:
H³ 3° (asd)₃
Would be exported as
H3 3deg (asd)3
So to parse this correctly, I've to get all the possibilities:
H3 3deg (asd)3 //This may be a real values
H³ 3deg (asd)3
H₃ 3deg (asd)3
H3 ³deg (asd)3
....
What would be the good way of doing this?
I doubt it's possible with such an encoding. All H3 values are equally likely unless there is a rule that differentiates them. This makes parsing more difficult, not less.
What you are trying to do though looks a lot like what has already been done with tools like Latex or even Word. You should probably use the encodings used by Latex since they've already done the work of encoding symbols to human readable and editable keywords that can be parsed easily, eg: use ^ for power, _ for indices, \degree for degrees, etc.
In fact, even Word allows these same keywords nowadays in the Math editor, allowing you to type \sum and get ∑, or \oint for ∮
You should probably tag the fields that contain substitutions, eg by surrounding them in multiple braces, so that users can use the keywords in their own text.
I think you need to exclude ambiguous mappings. E.g.:
³ | ^3, **3
₃ | 3
⁹ | ^9, **9
or
³ | 3, ^3, **3
₃ | _3
⁹ | 9
ASCII has 7 Bits for each character. Now you want to use chars which are implemented in the space of 8 Bits (UTF8 for example).
Now you lose information by converting your UTF8 character to ASCII but you want get back the full information.
To manage this, you need a mask, which helps to recognize the right character.
You could use special characters as your mask. So you don't reinvent the wheel and others can find the documentation all over the internet for your interface.
But if you make ³ => 3, you lose information (3 superscript => 3; where is the superscript and how you should guess the right choice?)
While writing an Antlr3 grammar in AntlrWorks (generating C#), I wrote the following set of lexer rules as follows:
array :
'[' properties? ']' -> ^(ARR properties?)
;
properties :
propertyName (','! propertyName)*
;
propertyName :
ID
| ESC_ID
;
ESC_ID :
'\'' ESC_STRING '\''
;
fragment
ESC_STRING
: ( ESCAPE_SEQ | ~('\u0000'..'\u001f' | '\\' | '\"' ) )*
;
However, whenever I try to parse any string where the ESC_ID rule is matched, I hit a phantom EOF character at the end of the string:
Input: ['testing 123']
<mismatched token: [#4,15:15='<EOF>',<-1>,1:15]
I know that the Java version of ANTLR's generated code is not thoroughly debugged, but I've managed to find my way around the quirks so far. Thoughts on how not to hit this error when matching this lexer rule?
UPDATE
I have now tried using the official C# port of Antlr3, and I still get the same error.
ANTLRWorks can't be used to generate code for the C# targets. You'll need to generate your C# code using the Antlr3.exe tool that's included in the C# port. The preferred method is using the MSBuild integration, which can either be done manually or (finally!) automatically using NuGet.
The latest official release is found here:
http://www.antlr.org/wiki/display/ANTLR3/Antlr3CSharpReleases
In addition to that, I have released an alpha build of ANTLR 3 on NuGet. If you enable the "Include Prereleases" in the NuGet package manager in Visual Studio 2010+, you'll find it listed as ANTLR 3 version 3.5.0.3-alpha002.