Does anyone know of a way in the Microsoft .NET framework to check the syntax, and only the syntax, of a given C# file?
For a little background, what I am interested in doing is setting up syntastic to check the syntax of .cs files. Out of the box, syntastic uses the Mono C# compiler with its --parse flag to perform this operation but I can find no equivalent in the Microsoft .NET framework.
My first attempt to get this to work was to use csc /target:library /nologo in place of mcs --parse, but the problem is that this is called on a per-file basis. As a result, it reports missing namespaces (which exist in the full project build) instead of only syntactic errors.
You can do this via the Roslyn CTP. It allows you to parse the .cs file entirely, and walk the full tree, looking for errors.
For details, I recommend downloading the Walkthrough: Getting Started with Syntax Analysis for C#, as it shows you the basic approach to looking at syntax trees in a C# file.
I've used NRefactory before from the icsharpcode IDE. It's quick and easy for basic stuff.
see this article:
Using NRefactory for analyzing C# code
I use it for creating VB.NET examples from C# examples. The method that does this is really straight-forward and can easily be adapted to your needs:
private static void ConvertLanguage(TextReader input, TextWriter output, SupportedLanguage language, Action<string> onError)
{
using (IParser parser = ParserFactory.CreateParser(language, input))
{
parser.Parse();
var specials = parser.Lexer.SpecialTracker.RetrieveSpecials();
var result = parser.CompilationUnit;
//if (parser.Errors.Count > 0)
// MessageBox.Show(parser.Errors.ErrorOutput, "Parse errors");
IOutputAstVisitor outputVisitor;
if (language == SupportedLanguage.CSharp)
outputVisitor = new VBNetOutputVisitor();
else
outputVisitor = new CSharpOutputVisitor();
outputVisitor.Options.IndentationChar = ' ';
outputVisitor.Options.IndentSize = 4;
outputVisitor.Options.TabSize = 4;
using (SpecialNodesInserter.Install(specials, outputVisitor))
result.AcceptVisitor(outputVisitor, null);
if (outputVisitor.Errors.Count > 0 && onError != null)
onError(outputVisitor.Errors.ErrorOutput);
output.Write(outputVisitor.Text);
}
}
Note: The preceding code is from an older version and may not compile against the latest version of the NRefactory library.
I think I may have a solution to your question. If you are trying to check the syntax of you code without being in the debugger you can use an online compiler suck as compilr.
If you wish to output the resuts then you can use this amazing api called Html Agility to grab the results off of the online compiler with ease. Hope this helped!
Related
I have a very simple grammar that (I think) should only allow additions of two elements like 1+1 or 2+3
grammar dumbCalculator;
expression: simple_add EOF;
simple_add: INT ADD INT;
INT:('0'..'9');
ADD : '+';
I generate my C# classes using the official ANTLR jar file
java -jar "antlr-4.9-complete.jar" C:\Users\Me\source\repos\ConsoleApp2\ConsoleApp2\dumbCalculator.g4 -o C:\Users\Me\source\repos\ConsoleApp2\ConsoleApp2\Dumb -Dlanguage=CSharp -no-listener -visitor
No matter what I try, the parser keeps adding the trailing elements although they shouldn't be allowed.
For example "1+1+1" gets parsed properly as an AST :
expression
simple_add
1
+
1
+
1
Although I specifically wrote that expression must be simple_add then EOF and simple_add is just INT ADD INT. I have no idea why the rest is being accepted, I expect ANTLR to throw an exception on this.
This is how I test my parser :
var inputStream = new AntlrInputStream("1+1+1");
var lexer = new dumbCalculatorLexer(inputStream);
lexer.RemoveErrorListeners();
lexer.AddErrorListener(new ThrowExceptionErrorListener());
var commonTokenStream = new CommonTokenStream(lexer);
var parser = new dumbCalculatorParser(commonTokenStream);
parser.RemoveErrorListeners();
parser.AddErrorListener(new ThrowExceptionErrorListener());
var ex = parser.expression();
ExploreAST(ex);
Why is the rest of the output being accepted ?
Classical scenario, I find my error 5 minutes after posting on Stack Overflow.
For anyone encountering a similar scenario, this happened because I did not explicitly set the ErrorHandler on my parser.
Naively, I expected all the AddErrorListener to handle the errors, but somehow there's a specific thing to do if you need the errors to be handled before visiting the tree.
I needed to add
parser.ErrorHandler = new BailErrorStrategy();
After this, I indeed got the exceptions on wrong input strings.
This is probably not the right thing to do, I'll let someone who knows ANTLR better to comment on this.
For a project refactoring, I need to execute 4 (regex) search and replace for, like, 80+ classes. Since the regex are very long, I'm currently copying and pasteing them from a txt note file... Ditto (a clipboard manager) helps me to speed up a little, but a better automation would be appreciated..! I tried with a couple of macro plug-ins but they don't work very well or are too much complicated (using envDTE). Anyone of you have ever needed to accomplish a similar task and found a solution to suggest?
Thanks for your help!
You can try my Visual Commander extension to automate this task. For example, to execute a search and replace with regex use the following code:
public void Run(EnvDTE80.DTE2 DTE, Microsoft.VisualStudio.Shell.Package package)
{
int options = (int)(EnvDTE.vsFindOptions.vsFindOptionsRegularExpression |
EnvDTE.vsFindOptions.vsFindOptionsMatchCase |
EnvDTE.vsFindOptions.vsFindOptionsMatchInHiddenText |
EnvDTE.vsFindOptions.vsFindOptionsSearchSubfolders |
EnvDTE.vsFindOptions.vsFindOptionsKeepModifiedDocumentsOpen);
DTE.Find.FindReplace(EnvDTE.vsFindAction.vsFindActionReplaceAll,
#"(\.Register\w*)\(""([^""]+)""",
options,
#"$1(nameof($2)",
EnvDTE.vsFindTarget.vsFindTargetCurrentDocument);
}
See DTE.Find.FindReplace documentation for more details.
After reading the following:
CLR Needed for C# 6.0
Does C# 6.0 work for .NET 4.0
it seemed to me that aside from String Interpolation any project I compiled in VS2015 against .NET 4.51 could use the new C# language features.
However I tried the following code on my dev machine using VS2015 targeting 4.51:
string varOne = "aaa";
string varTwo = $"{varOne}";
if (varTwo == "aaa")
{
}
and not only did I not receive a compiler error, it worked as varTwo contained aaa as expected.
Can someone explain why this is the case as I would not have expected this to work? I am guessing I am missing what FormattableString really means. Can someone give me an example?
As mentioned in the comments, string interpolation works in this case as all the new compiler does is convert the expression into an "equivalent string.Format call" at compile time.
From https://msdn.microsoft.com/en-us/magazine/dn879355.aspx
String interpolation is transformed at compile time to invoke an equivalent string.Format call. This leaves in place support for localization as before (though still with traditional format strings) and doesn’t introduce any post compile injection of code via strings.
The FormattableString is a new class allows you to inspect the string interpolation before rendering so you can check the values and protect against injection attacks.
// this does not require .NET 4.6
DateTime now = DateTime.Now;
string s = $"Hour is {now.Hour}";
Console.WriteLine(s);
//Output: Hour is 13
// this requires >= .NET 4.6
FormattableString fs = $"Hour is {now.Hour}";
Console.WriteLine(fs.Format);
Console.WriteLine(fs.GetArgument(0));
//Output: Hour is {0}
//13
Can someone explain why this is the case as I would not have expected this to work?
This works since you're compiling with the new Roslyn compiler which ships with VS2015, and knows how to parse the string interpolation syntactic sugar (it simply calls the proper overload of string.Format). If you'd try to take advantage of .NET Framework 4.6 classes that work nicely with string interpolation, such as FormattableString or IFormattable, you'd run into a compile time error (unless you add them yourself. See bottom part of the post).
I am guessing I am missing what FormattableString really means.
FormattableString is a new type introduced in .NET 4.6, which allows you to use the new string interpolation feature with a custom IFormatProvider of your choice. Since this can't be done directly on the interpolated string, you can take advantage of FormattableString.ToString(IFormatProvider) which can be passed any custom format.
After looking through posts for good C# parser generators, I stumbled across GPLEX and GPPG. I'd like to use GPLEX to generate tokens for GPPG to parse and create a tree (similar to the lex/yacc relationship). However, I can't seem to find an example on how these two interact together. With lex/yacc, lex returns tokens that are defined by yacc, and can store values in yylval. How is this done in GPLEX/GPPG (it is missing from their documentation)?
Attached is the lex code I would like to convert over to GPLEX:
%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
[Oo][Rr] return OR;
[Aa][Nn][Dd] return AND;
[Nn][Oo][Tt] return NOT;
[A-Za-z][A-Za-z0-9_]* yylval=yytext; return ID;
%%
Thanks!
Andrew
First: include the reference "QUT.ShiftReduceParser.dll" in your Project. It is provided in the download-package from GPLEX.
Sample-Code for Main-Program:
using System;
using ....;
using QUT.Gppg;
using Scanner;
using Parser;
namespace NCParser
{
class Program
{
static void Main(string[] args)
{
string pathTXT = #"C:\temp\testFile.txt";
FileStream file = new FileStream(pathTXT, FileMode.Open);
Scanner scanner = new Scanner();
scanner.SetSource(file, 0);
Parser parser = new Parser(scanner);
}
}
}
Sample-Code for GPLEX:
%using Parser; //include the namespace of the generated Parser-class
%Namespace Scanner //names the Namespace of the generated Scanner-class
%visibility public //visibility of the types "Tokens","ScanBase","Scanner"
%scannertype Scanner //names the Scannerclass to "Scanner"
%scanbasetype ScanBase //names the Scanbaseclass to "ScanBase"
%tokentype Tokens //names the Tokenenumeration to "Tokens"
%option codePage:65001 out:Scanner.cs /*see the documentation of GPLEX for further Options you can use */
%{ //user-specified code will be copied in the Output-file
%}
OR [Oo][Rr]
AND [Aa][Nn][Dd]
Identifier [A-Za-z][A-Za-z0-9_]*
%% //Rules Section
%{ //user-code that will be executed before getting the next token
%}
{OR} {return (int)Tokens.kwAND;}
{AND} {return (int)Tokens.kwAND;}
{Identifier} {yylval = yytext; return (int)Tokens.ID;}
%% //User-code Section
Sample-Code for GPPG-input-file:
%using Scanner //include the Namespace of the scanner-class
%output=Parser.cs //names the output-file
%namespace Parser //names the namespace of the Parser-class
%parsertype Parser //names the Parserclass to "Parser"
%scanbasetype ScanBase //names the ScanBaseclass to "ScanBase"
%tokentype Tokens //names the Tokensenumeration to "Tokens"
%token kwAND "AND", kwOR "OR" //the received Tokens from GPLEX
%token ID
%% //Grammar Rules Section
program : /* nothing */
| Statements
;
Statements : EXPR "AND" EXPR
| EXPR "OR" EXPR
;
EXPR : ID
;
%% User-code Section
// Don't forget to declare the Parser-Constructor
public Parser(Scanner scnr) : base(scnr) { }
c#parsegppggplex
I had a similar issue - not knowing how to use my output from GPLEX with GPPG due to an apparent lack of documentation. I think the problem stems from the fact that the GPLEX distribution includes gppg.exe along with gplex.exe, but only documentation for GPLEX.
If you go the GPPG homepage and download that distribution, you'll get the documentation for GPPG, which describes the requirements for the input file, how to construct your grammar, etc. Oh, and you'll also get both binaries again - gppg.exe and gplex.exe.
It almost seems like it would be simpler to just include everything in one package. It could definitely clear up some confusion, especially for those who may be new to lexical analysis (tokenization) and parsing (and may not be 100% familiar yet with the differences between the two).
So anyways, for those who may doing this for the first time:
GPLEX http://gplex.codeplex.com - used for tokenization/scanning/lexical analysis (same thing)
GPPG http://gppg.codeplex.com/ - takes output from a tokenizer as input to parse. For example, parsers use grammars and can do things a simple tokenizer cannot, like detect whether sets of parentheses match up.
Some time ago I have had the same need of using both GPLEX and GPPG together and for making the job much more easier I have created a nuget package for using GPPG and GPLEX together in Visual Studio.
This package can be installed in C# projects based on .Net Framework and adds some command-lets to the Package Manager Console in Visual Studio. This command-lets help you in configuring the C# project for integrating GPPG and GPLEX in the build process. Essentially in your project you will edit YACC and LEX files as source code and during the build of the project, the parser and the scanner will be generated. In addition the command-lets add to the projects the files needed for customizing the parser and the scanner.
You can find it here:
https://www.nuget.org/packages/YaccLexTools/
And here is a link to the blog post that explains how to use it:
http://ecianciotta-en.abriom.com/2013/08/yacclex-tools-v02.html
Have you considered using Roslyn? (This isn't a proper answer but I don't have enough reputation to post this as a comment)
Irony, because when I jumped into parsers in C# I started exactly from those 2 tools (about a year ago). Then lexer has tiny bug (easy to fix):
http://gplex.codeplex.com/workitem/11308
but parser had more severe:
http://gppg.codeplex.com/workitem/11344
Lexer should be fixed (release date is June 2013), but parser probably still has this bug (May 2012).
So I wrote my own suite :-) https://sourceforge.net/projects/naivelangtools/ and use and develop it since then.
Your example translates (in NLT) to:
/[Oo][Rr]/ -> OR;
/[Aa][Nn][Dd]/ -> AND;
/[Nn][Oo][Tt]/ -> NOT;
// by default text is returned as value
/[A-Za-z][A-Za-z0-9_]*/ -> ID;
Entire suite is similar to lex/yacc, when possible it does not rely on side effects (so you return appropriate value).
I am trying to parse out some information from Google's geocoding API but I am having a little trouble with efficiently getting the data out of the xml. See link for example
All I really care about is getting the short_name from address_component where the type is administrative_area_level_1 and the long_name from administrative_area_level_2
However with my test program my XPath query returns no results for both queries.
public static void Main(string[] args)
{
using(WebClient webclient = new WebClient())
{
webclient.Proxy = null;
string locationXml = webclient.DownloadString("http://maps.google.com/maps/api/geocode/xml?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false");
using(var reader = new StringReader(locationXml))
{
var doc = new XPathDocument(reader);
var nav = doc.CreateNavigator();
Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/address_component[type=administrative_area_level_1]/short_name").InnerXml);
Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/address_component[type=administrative_area_level_2]/long_name").InnerXml);
}
}
}
Can anyone help me find what I am doing wrong, or recommending a better way?
You need to put the value of the node you're looking for in quotes:
".../address_component[type='administrative_area_level_1']/short_name"
↑ ↑
I'd definitely recommend using LINQ to XML instead of XPathNavigator. It makes XML querying a breeze, in my experience. In this case I'm not sure exactly what's wrong... but I'll come up with a LINQ to XML snippet instead.
using System;
using System.Linq;
using System.Net;
using System.Xml.Linq;
class Test
{
public static void Main(string[] args)
{
using(WebClient webclient = new WebClient())
{
webclient.Proxy = null;
string locationXml = webclient.DownloadString
("http://maps.google.com/maps/api/geocode/xml?address=1600"
+ "+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false");
XElement root = XElement.Parse(locationXml);
XElement result = root.Element("result");
Console.WriteLine(result.Elements("address_component")
.Where(x => (string) x.Element("type") ==
"administrative_area_level_1")
.Select(x => x.Element("short_name").Value)
.First());
Console.WriteLine(result.Elements("address_component")
.Where(x => (string) x.Element("type") ==
"administrative_area_level_2")
.Select(x => x.Element("long_name").Value)
.First());
}
}
}
Now this is more code1... but I personally find it easier to get right than XPath, because the compiler is helping me more.
EDIT: I feel it's worth going into a little more detail about why I generally prefer code like this over using XPath, even though it's clearly longer.
When you use XPath within a C# program, you have two different languages - but only one is in control (C#). XPath is relegated to the realm of strings: Visual Studio doesn't give an XPath expression any special handling; it doesn't understand that it's meant to be an XPath expression, so it can't help you. It's not that Visual Studio doesn't know about XPath; as Dimitre points out, it's perfectly capable of spotting errors if you're editing an XSLT file, just not a C# file.
This is the case whenever you have one language embedded within another and the tool is unaware of it. Common examples are:
SQL
Regular expressions
HTML
XPath
When code is presented as data within another language, the secondary language loses a lot of its tooling benefits.
While you can context switch all over the place, pulling out the XPath (or SQL, or regular expressions etc) into their own tooling (possibly within the same actual program, but in a separate file or window) I find this makes for harder-to-read code in the long run. If code were only ever written and never read afterwards, that might be okay - but you do need to be able to read code afterwards, and I personally believe the readability suffers when this happens.
The LINQ to XML version above only ever uses strings for pure data - the names of elements etc - and uses code (method calls) to represent actions such as "find elements with a given name" or "apply this filter". That's more idiomatic C# code, in my view.
Obviously others don't share this viewpoint, but I thought it worth expanding on to show where I'm coming from.
Note that this isn't a hard and fast rule of course... in some cases XPath, regular expressions etc are the best solution. In this case, I'd prefer the LINQ to XML, that's all.
1 Of course I could have kept each Console.WriteLine call on a single line, but I don't like posting code with horizontal scrollbars on SO. Note that writing the correct XPath version with the same indentation as the above and avoiding scrolling is still pretty nasty:
Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/" +
"address_component[type='administrative_area_level_1']" +
"/short_name").InnerXml);
In general, long lines work a lot better in Visual Studio than they do on Stack Overflow...
I would recommend just typing the XPath expression as part of an XSLT file in Visual Studio. You'll get error messages "as you type" -- this is an excellent XML/XSLT/XPath editor.
For example, I am typing:
<xsl:apply-templates select="#* | node() x"/>
and immediately get in the Error List window the following error:
Error 9 Expected end of the expression, found 'x'. #* | node() -->x<--
XSLTFile1.xslt 9 14 Miscellaneous Files
Only when the XPath expression does not raise any errors (I might also test that it selects the intended nodes, too), would I put this expression into my C# code.
This ensures that I will have no XPath -- syntax and semantic -- errors when I run the C# program.
dtb's response is accurate. I wanted to add that you can use xpath testing tools like the link below to help find the correct xpath:
http://www.bit-101.com/xpath/
string url = #"http://maps.google.com/maps/api/geocode/xml?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false";
string value = "administrative_area_level_1";
using(WebClient client = new WebClient())
{
string wcResult = client.DownloadString(url);
XDocument xDoc = XDocument.Parse(wcResult);
var result = xDoc.Descendants("address_component")
.Where(p=>p.Descendants("type")
.Any(q=>q.Value.Contains(value))
);
}
The result is an enumeration of "address_component"s that have at least one "type" node that has contains the value you're searching for. The result of the query above is an XElement that contains the following data.
<address_component>
<long_name>California</long_name>
<short_name>CA</short_name>
<type>administrative_area_level_1</type>
<type>political</type>
</address_component>
I would really recommend spending a little time learning LINQ in general because its very useful for manipulating and querying in-memory objects, querying databases and tends to be easier than using XPath when working with XML. My favorite site to reference is http://www.hookedonlinq.com/