Antlr3 (C# Target) Consistently Hits Phantom EOF Character - c#

While writing an Antlr3 grammar in AntlrWorks (generating C#), I wrote the following set of lexer rules as follows:
array :
'[' properties? ']' -> ^(ARR properties?)
;
properties :
propertyName (','! propertyName)*
;
propertyName :
ID
| ESC_ID
;
ESC_ID :
'\'' ESC_STRING '\''
;
fragment
ESC_STRING
: ( ESCAPE_SEQ | ~('\u0000'..'\u001f' | '\\' | '\"' ) )*
;
However, whenever I try to parse any string where the ESC_ID rule is matched, I hit a phantom EOF character at the end of the string:
Input: ['testing 123']
<mismatched token: [#4,15:15='<EOF>',<-1>,1:15]
I know that the Java version of ANTLR's generated code is not thoroughly debugged, but I've managed to find my way around the quirks so far. Thoughts on how not to hit this error when matching this lexer rule?
UPDATE
I have now tried using the official C# port of Antlr3, and I still get the same error.

ANTLRWorks can't be used to generate code for the C# targets. You'll need to generate your C# code using the Antlr3.exe tool that's included in the C# port. The preferred method is using the MSBuild integration, which can either be done manually or (finally!) automatically using NuGet.
The latest official release is found here:
http://www.antlr.org/wiki/display/ANTLR3/Antlr3CSharpReleases
In addition to that, I have released an alpha build of ANTLR 3 on NuGet. If you enable the "Include Prereleases" in the NuGet package manager in Visual Studio 2010+, you'll find it listed as ANTLR 3 version 3.5.0.3-alpha002.

Related

How can I use System.Commandline.DragonFruit to parse options if the options are not preceded with double dash

I have an existing console app that takes option args that don't have the -- in front of them. For example:
./myapp businessDate=20230128 type=charley location=nashville
Currently I parse the args[] array tokens and split them around the "=" to get the key/value
I can't change that, because other programs already call it that way. But it appears that DragonFruit needs to have it this way instead.
./myapp --businessDate=20230128 type=charley --location=nashville
So my question is, can DragonFruit be configured to NOT use the -- prefix on options when I use the equals sign to separate the key from the value the way I currently do it in the example above? I believe the answer is "no", because of the Posix compliance. But maybe I missed something.
What I Tried
I ran the example program here Building your first app with System.CommandLine.DragonFruit And it worked fine when I ran it:
$ dotnet run --int-option=123 --bool-option=true
The value for --int-option is: 123
The value for --bool-option is: True
The value for --file-option is: null
But when I tried it without the --, this way instead, it gave me errors, and I was hoping it would work as above:
$ dotnet run int-option=123 bool-option=true
Unrecognized command or argument 'int-option=123'.
Unrecognized command or argument 'bool-option=true'.

Negate POSIX or Unicode character classes in ANTLR Lexer (C#)

ANTLR build system:
Visual Studio 2017, C#
NuGet packages: Antlr4.CodeGenerator 4.6.5-rc002, Antlr4.Runtime 4.6.5-rc002
I've got the following Flex rule which I'd like to convert to ANTLR 4:
NOT_NAME [^[:alpha:]_*\n]+
I think that I've already found out that ANTLR doesn't support POSIX or Unicode character classes but that you can create fragments to include them into your lexer grammar.
In my attempt to translate the above rule I've already created the following fragments:
fragment ALPHA: L | Nl;
fragment L : Ll | Lm | Lo | Lt | Lu ;
fragment Ll : '\u0061'..'\u007A' ; /* rest omitted for brevity */
fragment Lm : '\u02B0'..'\u02C1' ; /* rest omitted for brevity */
fragment Lo : '\u00AA' | '\u00BA' ; /* rest omitted for brevity */
fragment Lt : '\u01C5' | '\u01C8' ; /* rest omitted for brevity */
fragment Lu : '\u0041'..'\u005A' ; /* rest omitted for brevity */
fragment Nl : '\u16EE'..'\u16F0' ; /* rest omitted for brevity */
The ANTLR rule I had thought would work was the following:
NOT_NAME: ~(ALPHA | '_' | '*' | '\n')+;
but it gives me the following error:
rule reference 'ALPHA' is not currently supported in a set
The problem seems to be the negation as rules without negation seem to work without problems.
I know that it works if I inline all the above fragments into one rule but this appears insanely complicated to me - especially given the pretty simple and straightforward Flex rule.
I must be missing some elegant trick that you will possibly point me to.
The Unicode characterset support doesn't depend on the target runtime. The ANTLR4 tool itself converts the grammars and also parses the charset definitions. You should be able to use any of the Unicode classes as laid out in the lexer documentation. I'm not sure however if you can negate that block with the tilde. At least there is the option to use \P... to negate a char class (also mention in that document).

how to connect ANTLRWorks output to c# project?

i have written a grammar for while in ANTLRWorks 1.5.2.
i also added some actions so when i debug my code with a while code it will show 3 address code in output of ANTLRWorks.
my grammar is like that:
NAME:
LETTER (LETTER | DIGIT | '_')*;
NUMBER:
DIGIT+; // just integers
fragment DIGIT:
'0'..'9';
fragment LETTER:
'A'..'Z' | 'a'..'z';
RELATION:
'<' | '<=' | '==' | '>=' | '>' | '!=' ;
WHITESPACE:
(' ' | '\t' | '\r' | '\n')+ { $channel = HIDDEN; };
and i generate my grammar and i have whileParser.cs and whileLexer.cs in output folder.
now i want to add my grammar to a c# project.
i want to get input from user and then show output of my grammar to them.
and i dont know how to add .g file and output classes to a c# project.
i am using visual studio 2013.
can anybody help me?
You grammar contains Java code blocks, you need to translate them to C# first. Actually, it may be a good opportunity for you to use ANTLR 4 instead and/or to switch to a parse tree approach. I should mention there's an ANTLRWorks 2 version, mostly for ANTLR 4, should you need it.
Anyway, just install the ANTLR Visual Studio Plugin and let it handle that for you. It works with both ANTLR 3 and 4.
You'll then have to add the ANTLR runtime to your project. For this, you can install the ANTLR4 NuGet or the ANTLR3 version depending on what version you chose to use in the end.

Making Resharper put a space in-between class/namespace identifier and the curly brace

I've just installed Resharper on my machine, and by default he presents me with the following C# code formatting:
namespace machineLearning{
public class Class1{
}
}
I've tried fiddling with the different options on Options -> C# -> Formatting Style but I can't seem to find what the option to correct this behaviour is. There seems to be no option explicitly or less-explicitly concerning adding a space between the identifier and the following brace.
How to accomplish that?
In Resharper 7 It is under Resharper -> Manage Options -> Code Editing -> C# -> Formatting Style -> Braces Layout -> Method declaration -> (set the value to ) At End of line (K & R Style)

Parser Generator: How to use GPLEX and GPPG together?

After looking through posts for good C# parser generators, I stumbled across GPLEX and GPPG. I'd like to use GPLEX to generate tokens for GPPG to parse and create a tree (similar to the lex/yacc relationship). However, I can't seem to find an example on how these two interact together. With lex/yacc, lex returns tokens that are defined by yacc, and can store values in yylval. How is this done in GPLEX/GPPG (it is missing from their documentation)?
Attached is the lex code I would like to convert over to GPLEX:
%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
[Oo][Rr] return OR;
[Aa][Nn][Dd] return AND;
[Nn][Oo][Tt] return NOT;
[A-Za-z][A-Za-z0-9_]* yylval=yytext; return ID;
%%
Thanks!
Andrew
First: include the reference "QUT.ShiftReduceParser.dll" in your Project. It is provided in the download-package from GPLEX.
Sample-Code for Main-Program:
using System;
using ....;
using QUT.Gppg;
using Scanner;
using Parser;
namespace NCParser
{
class Program
{
static void Main(string[] args)
{
string pathTXT = #"C:\temp\testFile.txt";
FileStream file = new FileStream(pathTXT, FileMode.Open);
Scanner scanner = new Scanner();
scanner.SetSource(file, 0);
Parser parser = new Parser(scanner);
}
}
}
Sample-Code for GPLEX:
%using Parser; //include the namespace of the generated Parser-class
%Namespace Scanner //names the Namespace of the generated Scanner-class
%visibility public //visibility of the types "Tokens","ScanBase","Scanner"
%scannertype Scanner //names the Scannerclass to "Scanner"
%scanbasetype ScanBase //names the Scanbaseclass to "ScanBase"
%tokentype Tokens //names the Tokenenumeration to "Tokens"
%option codePage:65001 out:Scanner.cs /*see the documentation of GPLEX for further Options you can use */
%{ //user-specified code will be copied in the Output-file
%}
OR [Oo][Rr]
AND [Aa][Nn][Dd]
Identifier [A-Za-z][A-Za-z0-9_]*
%% //Rules Section
%{ //user-code that will be executed before getting the next token
%}
{OR} {return (int)Tokens.kwAND;}
{AND} {return (int)Tokens.kwAND;}
{Identifier} {yylval = yytext; return (int)Tokens.ID;}
%% //User-code Section
Sample-Code for GPPG-input-file:
%using Scanner //include the Namespace of the scanner-class
%output=Parser.cs //names the output-file
%namespace Parser //names the namespace of the Parser-class
%parsertype Parser //names the Parserclass to "Parser"
%scanbasetype ScanBase //names the ScanBaseclass to "ScanBase"
%tokentype Tokens //names the Tokensenumeration to "Tokens"
%token kwAND "AND", kwOR "OR" //the received Tokens from GPLEX
%token ID
%% //Grammar Rules Section
program : /* nothing */
| Statements
;
Statements : EXPR "AND" EXPR
| EXPR "OR" EXPR
;
EXPR : ID
;
%% User-code Section
// Don't forget to declare the Parser-Constructor
public Parser(Scanner scnr) : base(scnr) { }
c#parsegppggplex
I had a similar issue - not knowing how to use my output from GPLEX with GPPG due to an apparent lack of documentation. I think the problem stems from the fact that the GPLEX distribution includes gppg.exe along with gplex.exe, but only documentation for GPLEX.
If you go the GPPG homepage and download that distribution, you'll get the documentation for GPPG, which describes the requirements for the input file, how to construct your grammar, etc. Oh, and you'll also get both binaries again - gppg.exe and gplex.exe.
It almost seems like it would be simpler to just include everything in one package. It could definitely clear up some confusion, especially for those who may be new to lexical analysis (tokenization) and parsing (and may not be 100% familiar yet with the differences between the two).
So anyways, for those who may doing this for the first time:
GPLEX http://gplex.codeplex.com - used for tokenization/scanning/lexical analysis (same thing)
GPPG http://gppg.codeplex.com/ - takes output from a tokenizer as input to parse. For example, parsers use grammars and can do things a simple tokenizer cannot, like detect whether sets of parentheses match up.
Some time ago I have had the same need of using both GPLEX and GPPG together and for making the job much more easier I have created a nuget package for using GPPG and GPLEX together in Visual Studio.
This package can be installed in C# projects based on .Net Framework and adds some command-lets to the Package Manager Console in Visual Studio. This command-lets help you in configuring the C# project for integrating GPPG and GPLEX in the build process. Essentially in your project you will edit YACC and LEX files as source code and during the build of the project, the parser and the scanner will be generated. In addition the command-lets add to the projects the files needed for customizing the parser and the scanner.
You can find it here:
https://www.nuget.org/packages/YaccLexTools/
And here is a link to the blog post that explains how to use it:
http://ecianciotta-en.abriom.com/2013/08/yacclex-tools-v02.html
Have you considered using Roslyn? (This isn't a proper answer but I don't have enough reputation to post this as a comment)
Irony, because when I jumped into parsers in C# I started exactly from those 2 tools (about a year ago). Then lexer has tiny bug (easy to fix):
http://gplex.codeplex.com/workitem/11308
but parser had more severe:
http://gppg.codeplex.com/workitem/11344
Lexer should be fixed (release date is June 2013), but parser probably still has this bug (May 2012).
So I wrote my own suite :-) https://sourceforge.net/projects/naivelangtools/ and use and develop it since then.
Your example translates (in NLT) to:
/[Oo][Rr]/ -> OR;
/[Aa][Nn][Dd]/ -> AND;
/[Nn][Oo][Tt]/ -> NOT;
// by default text is returned as value
/[A-Za-z][A-Za-z0-9_]*/ -> ID;
Entire suite is similar to lex/yacc, when possible it does not rely on side effects (so you return appropriate value).

Categories