antlr text from matching rule in c#

antlr text from matching rule in c# - c#

In an Antlr 3 grammar, is it possible to print out the full text matching a rule in a grammar targeting c#? Something like below:
rule : FIRST SECOND
{ Console.WriteLine($rule.text); };//does not work.
FIRST: 'first';
SECOND: 'second';

If $rule.text doesn't work (as #dana suggested), you might try $rule.Text or even $rule.GetText().
If all fails, please tell us which version of the C# port you're using (and where it can be downloaded), then I (or someone else) can perhaps give it a try.

Related

Replace beginning and end of string with unique midle?

I have lots of code like below:
PlusEnvironment.EnumToBool(Row["block_friends"].ToString())
I need to convert them to something like this.
Row["block_friends"].ToString() == "1"
The value that gets passed to EnumToBool is always unique, meaning there is no guarantee that itll be passed by a row, it could be passed by a variable, or even a method that returns a string.
I've tried doing this with regex, but its sort of sketchy and doesn't work 100%.
PlusEnvironment\.EnumToBool\((.*)\)
I need to do this in Visual Studio's find and replace. I'm using VS 17.

If you had a few places where PlusEnvironment.EnumToBool() was called, I would have done the same thing that #IanMercer suggested: just replace PlusEnvironment.EnumToBool( with empty string and the fix all the syntax errors.
#IanMercer has also given you a link to super cool, advanced regex usage that will help you.
But if you are skeptical about using such a complex regex on hundreds of files, here is what I would have done:
Define my own PlusEnvironment class with EnumToBool functionality in my own namespace. And then just replace the using Plus; line with using <my own namespace>; in those hundreds of files. That way my changes will be limited to only the using... line, 1 line per file, and it will be simple find and replace, no regex needed.
(Note: I'm assuming that you don't want to use PlusEnvironment, or the complete library and hence you want to do this type of replacement.)

in Find and Replace Window:
Find:
PlusEnvironment\.EnumToBool\((.*))
Replace:
$1 == "1"
Make sure "Use Regular Expressions" is selected

C99 grammar in Irony - declaration/statement conflicts

I'm trying to use Irony to parse C99, and I found a grammar online to guide me.
I'm having difficulty with conflicts on declaration versus statement. The following rule fails to detect a pointer declaration with initializer.
blockItemList.Rule = MakePlusRule(blockItemList, blockItem);
blockItem.Rule = declaration | statement;
The type of line it's failing on would be:
MyType *x = foo();
When I remove labeledStatement and expressionStatement from statement's rule (both may start with identifier), this type of declaration is recognized correctly.
What's the best way to force Irony to exhaust the declaration rule before trying statement? Or, can I add to the grammar as Irony parses so that it can register MyType as a terminal rather than an identifier?

I remember having similiar problems with function calls and identifiers. Don't think you've done something particularily wrong, it's just the way grammars work. You need to "fine-tune" it for Irony. As far I know, Irony is LALR(1) parser, eg looking only one symbol forward when doing decisions. This might mean that you need to do more work than just define the given grammar.
I had case where I had conflicts in my grammar and I fixed it by lowering the "precision" of grammar. The actual precision was later restored through AST nodes.
Ps, you can also:
Use Irony GrammarExplorer and see what conflicts your grammar has. You can sometimes fix the conflicts with PreferSHiftHere() or ReduceHere()
And few links that I think are interesting to read:
http://irony.codeplex.com/discussions/400830
http://irony.codeplex.com/discussions/80134
https://irony.codeplex.com/discussions/551074
Context-free grammar understanding is not enough - you have to know
smth about parsing methods like LR, LALR(1), LL, etc. Irony is
LALR(1), while Antlr is LL. Grammar rules should be fine-tuned for a
specific method. Irony 'is insisting' on something 'wrong' means that
it took one of two equally possible alternatives resulting from
ambiguities (conflicts!) that it reports. So no point trying to parse
smth before you fix the conflicts. To do this - read more about LALR
grammars.

regular expression validtor

i have text box for phone number .i need to validate it.my requiremants are
Take only numeric more than 10digits
Take symbols like (,),-,
can any one help for this.i tried
^[\d{10,14} +\s +\( +\)-]+$
but not working.

You may take a look at the following article which will help you build such expression.

You haven't said what is wrong with your regex (why it's not working as expected) but I'm guessing that the issue is it matches far more than it should. I.e it will match 1 or more of all the characters in your set (rather than just between 10 and 14).
I think you're mistake is that you have put way too much in your character set. You've got the + symbol in there 3 times and it looks like your trying to use quantifiers from within the set as well, which is not allowed. Character sets are the equivalent of single character alternations. So, [abc] is the equivalent of a|b|c.
I'm assuming that you want the input to be between 10 and 14 numbers while still allowing any number (zero or more) of the following characters:
+()-,
As some others have suggested, you could just put the chars you want in a set and then specify the quantifier after it like this: ^[0-9()-,+]{10,14}$. This will almost get you there. Only problem with it is that it will allow between 10 and 14 of any of these characters, so it would successfully match this:
,,,,,++()---
Which clearly you don't want (do you?)
So, in order to better solve this problem, you'll need to be more specific about what is allowed and where in the subject it is allowed. Because i don't know exactly what you want to match, i can't take you much further.
Hopefully the information I've provided here should be good enough to get you started, and if you have more questions... well that's what we're all here for right, so ask away.
To help you out with learning, below are a few resources you might find useful (this is a small subset of what's available, so do go ahead and search for yourself):
Testing tools
Rubular (ruby)
GSkinner Regex Testser
RegexHero (dotnet)
Helpful info
Regular-Expressions.Info
Codeproject 30 Minute Tutorial

How to detect a C++ identifier string?

E.g:
isValidCppIdentifier("_foo") // returns true
isValidCppIdentifier("9bar") // returns false
isValidCppIdentifier("var'") // returns false
I wrote some quick code but it fails:
my regex is "[a-zA-Z_$][a-zA-Z0-9_$]*"
and I simply do regex.IsMatch(inputString).
Thanks..

It should work with some added anchoring:
"^[a-zA-Z_][a-zA-Z0-9_]*$"
If you really need to support ludicrous identifiers using Unicode, feel free to read one of the various versions of the standard and add all the ranges into your regexp (for example, pages 713 and 714 of http://www-d0.fnal.gov/~dladams/cxx_standard.pdf)

Matti's answer will work to sanitize identifiers before inserting into C++ code, but won't handle C++ code as input very well. It will be annoying to separate things like L"wchar_t string", where L is not an identifier. And there's Unicode.
Clang, Apple's compiler which is built on a philosophy of modularity, provides a set of tokenizer functions. It looks like you would want clang_createTranslationUnitFromSourceFile and clang_tokenize.
I didn't check to see if it handles \Uxxxx or anything. Can't make any kind of gurarantees. Last time I used LLVM was five years ago and it wasn't the greatest experience… but not the worst either.
On the other hand, GCC certainly has it, although you have to figure out how to use cpp_lex_direct.

Regex index in matching string where the match failed

I am wondering if it is possible to extract the index position in a given string where a Regex failed when trying to match it?
For example, if my regex was "abc" and I tried to match that with "abd" the match would fail at index 2.
Edit for clarification. The reason I need this is to allow me to simplify the parsing component of my application. The application is an Assmebly language teaching tool which allows students to write, compile, and execute assembly like programs.
Currently I have a tokenizer class which converts input strings into Tokens using regex's. This works very well. For example:
The tokenizer would produce the following tokens given the following input = "INP :x:":
Token.OPCODE, Token.WHITESPACE, Token.LABEL, Token.EOL
These tokens are then analysed to ensure they conform to a syntax for a given statement. Currently this is done using IF statements and is proving cumbersome. The upside of this approach is that I can provide detailed error messages. I.E
if(token[2] != Token.LABEL) { throw new SyntaxError("Expected label");}
I want to use a regular expression to define a syntax instead of the annoying IF statements. But in doing so I lose the ability to return detailed error reports. I therefore would at least like to inform the user of WHERE the error occurred.

I agree with Colin Younger, I don't think it is possible with the existing Regex class. However, I think it is doable if you are willing to sweat a little:
Get the Regex class source code
(e.g.
http://www.codeplex.com/NetMassDownloader
to download the .Net source).
Change the code to have a readonly
property with the failure index.
Make sure your code uses that Regex
rather than Microsoft's.

I guess such an index would only have meaning in some simple case, like in your example.
If you'll take a regex like "ab*c*z" (where by * I mean any character) and a string "abbbcbbcdd", what should be the index, you are talking about?
It will depend on the algorithm used for mathcing...
Could fail on "abbbc..." or on "abbbcbbc..."

I don't believe it's possible, but I am intrigued why you would want it.

In order to do that you would need either callbacks embedded in the regex (which AFAIK C# doesn't support) or preferably hooks into the regex engine. Even then, it's not clear what result you would want if backtracking was involved.

It is not possible to be able to tell where a regex fails. as a result you need to take a different approach. You need to compare strings. Use a regex to remove all the things that could vary and compare it with the string that you know it does not change.
I run into the same problem came up to your answer and had to work out my own solution. Here it is:
https://stackoverflow.com/a/11730035/637142
hope it helps

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

antlr text from matching rule in c# - c#

In an Antlr 3 grammar, is it possible to print out the full text matching a rule in a grammar targeting c#? Something like below: rule : FIRST SECOND { Console.WriteLine($rule.text); };//does not work. FIRST: 'first'; SECOND: 'second';

If $rule.text doesn't work (as #dana suggested), you might try $rule.Text or even $rule.GetText(). If all fails, please tell us which version of the C# port you're using (and where it can be downloaded), then I (or someone else) can perhaps give it a try.

Related

Replace beginning and end of string with unique midle?

C99 grammar in Irony - declaration/statement conflicts

regular expression validtor

How to detect a C++ identifier string?

Regex index in matching string where the match failed

Categories

Resources