How do I create syntax nodes in Roslyn from scratch?

How do I create syntax nodes in Roslyn from scratch? - c#

I would like to generate syntax nodes with the Roslyn API without having a pre-existing syntax node. That is, I cannot simply use the WithXYZ() methods on an existing object to modify it because there is no existing object.
For example, I would like to generate an InvocationExpressionSyntax object. Assuming a constructor was available, I could do something like
var invoke = new InvocationExpressionSyntax(expression, arguments);
But the constructor for InvocationExpressionSyntax seems to not be public.
http://www.philjhale.com/2012/10/getting-started-with-roslyn.html
this blog suggests that I can use an API such as
Syntax.InvocationExpression()
but I don't see what Syntax refers to, and I don't see anything that resembles it in the Roslyn API.
I did find Microsoft.CodeAnalysis.VisualBasic.SyntaxFactory that lets me do
var invoke = SyntaxFactory.InvocationExpression().WithExpression(expression);
And this works well enough for me. There is also Microsoft.CodeAnalysis.CSharp.SyntaxFactory for anyone wondering.
Is SyntaxFactory the proper way to create new syntax nodes?
The way I found SyntaxFactory.InvocationExpression was by looking at the PublicAPI.txt file in the roslyn source code (https://github.com/dotnet/roslyn) under the src/Compilers/VisualBasic/Portable directory. Otherwise, I don't see where SyntaxFactory is documented.

As the other answer stated, the SyntaxFactory is the correct class to use. As you have found there are two syntax factories available, Microsoft.CodeAnalysis.CSharp.SyntaxFactory and Microsoft.CodeAnalysis.VisualBasic.SyntaxFactory, depending on which language you are using.
Usually the calls into the SyntaxFactory are chained together, so you end up with many calls to the SytnaxFactory methods to generate even simple lines of code. For example, the code Console.WriteLine("A"); would be represented by the following calls to the Syntax Factory:
var console = SyntaxFactory.IdentifierName("Console");
var writeline = SyntaxFactory.IdentifierName("WriteLine");
var memberaccess = SyntaxFactory.MemberAccessExpression(SyntaxKind.SimpleMemberAccessExpression, console, writeline);
var argument = SyntaxFactory.Argument(SyntaxFactory.LiteralExpression(SyntaxKind.StringLiteralExpression, SyntaxFactory.Literal("A")));
var argumentList = SyntaxFactory.SeparatedList(new[] { argument });
var writeLineCall =
SyntaxFactory.ExpressionStatement(
SyntaxFactory.InvocationExpression(memberaccess,
SyntaxFactory.ArgumentList(argumentList)));
If you are unsure of how to generate nodes for some specific code, Kirill Osenkov created the Roslyn Quoter project on GitHub, which you can use to generate the SyntaxFactory code for you.
I recently did a blog post on this topic if you would like to read further.

Yes, the SyntaxFactory type is the way to create syntax nodes from scratch.

Related

Redundant Parentheses in Roslyn SyntaxGenerator AssignmentStatement

I'm using Roslyn to generate code for a tool I'm building. I'm using the SyntaxGenerator (as opposed to SyntaxFactory) as I'd prefer to not have to write implementations for both C# and VB.NET.
According to the reference source, the AssignmentStatement deliberately adds parentheses to the right side node. E.g.
Key = (key)
Whereas what I'm looking for is without parentheses:
Key = key
Example
This is a very basic, very watered down example which should reproduce this behaviour.
public void Example()
{
using (var workspace = new AdhocWorkspace())
{
var generator = SyntaxGenerator.GetGenerator(workspace, LanguageNames.CSharp);
var statement = generator.AssignmentStatement(generator.IdentifierName("Key"), generator.IdentifierName("key"));
var result = statement.NormalizeWhitespace().ToFullString();
}
}
I've searched extensively both here on SO as well as on Github, Roslyn Reference Source and various blogs and examples* but can't seem to find an elegant way to remove these redundant parentheses.
Although this is still legal syntax and will compile, it's really annoying.
Any ideas? Am I doing something wrong? Tips? Hopeless case?
*This example generates a constructor with redundant parentheses in the assignments as well.

Find all method calls for a specific method using Roslyn

I am working on a code analyser using Roslyn and my current task is to find all internal methods which are unused in the assembly.
I start with a MethodDeclarationSyntax and get the symbol from that. I then use the FindCallersAsync method in SymbolFinder, but it returns an empty collection even when I am making a call to the method in question somewhere in the assembly. See the code below.
protected override void Analyze(SyntaxNodeAnalysisContext context)
{
NodeToAnalyze = context.Node;
var methodDeclaration = NodeToAnalyze as MethodDeclarationSyntax;
if (methodDeclaration == null)
return;
var methodSymbol = context.SemanticModel.GetDeclaredSymbol(methodDeclaration) as ISymbol;
if (methodSymbol.DeclaredAccessibility != Accessibility.Internal)
return;
var solutionPath = GetSolutionPath();
var msWorkspace = MSBuildWorkspace.Create();
var solution = msWorkspace.OpenSolutionAsync(solutionPath).Result;
var callers = SymbolFinder.FindCallersAsync(symbol, solution).Result; // Returns empty collection.
...
}
I have seen similar code here, but in that example the method symbol is obtained using GetSymbolInfo on an InvocationExpressionSyntax:
//Get the syntax node for the first invocation to M()
var methodInvocation = doc.GetSyntaxRootAsync().Result.DescendantNodes().OfType<InvocationExpressionSyntax>().First();
var methodSymbol = model.GetSymbolInfo(methodInvocation).Symbol;
//Finds all references to M()
var referencesToM = SymbolFinder.FindReferencesAsync(methodSymbol, doc.Project.Solution).Result;
However, in my case, I need to find the invocations (if any) from a declaration. If I do get the invocation first and pass in the symbol from GetSymbolInfo the calls to the method are returned correctly - so the issue seems to be with the symbol parameter and not solution.
Since I am trying to get the underlying symbol of a declaration, I cannot use GetSymbolInfo, but use GetDeclaredSymbol instead (as suggested here).
My understanding from this article is that the symbols returned from GetDeclaredSymbol and GetSymbolInfo should be the same. However, a simple comparison using Equals returns false.
Does anyone have any idea of what the difference is between the two symbols returned and how I can get the 'correct' one which works? Or perhaps there is a better approach entirely? All my research seems to point to FindCallersAsync, but I just can't get it to work.

My understanding from this article is that the symbols returned from GetDeclaredSymbol and GetSymbolInfo should be the same. However, a simple comparison using Equals returns false.
This is because they're not the same symbol; they are coming from entirely different compilations which might or might not be different. One is coming from the compiler that is actively compiling, one is coming from MSBuildWorkspace.
Fundamentally, using MSBuildWorkspace in an analyzer is unsupported. Completely. Don't do that. Not only would that be really slow, but it also has various correctness issues, especially if you're running your analyzer in Visual Studio. If your goal is to find unused methods anywhere in a solution, that's something we don't really support implementing as an analyzer either, since that involves cross-project analysis.

Is there .net magic to get parameter values by name in console application?

I've been developing .net console applications using C# and have always just dictated what order parameters must be inserted in so that args[0] is always start date and args[1] is always end date, for example.
however I would like to move over to using named parameters so that any combination of parameters can be sent in any order, such as the typical "-sd" would prefix a start date.
I know I could parse through the args[] looking for "-" and then read the name and look the next position for the accompanying value, but before doing that wanted to see if there was any kind of baked in handling for this rather standard practice.
is there something like this out there already that could do as such:
DateTime startDate = (DateTime)((ConsoleParameters)args[])["sd"]
I'm using C# and .Net 4

There is nothing built into the core framework.
A lot of people think NDesk.Options is useful for this sort of thing. Check out this example (taken directly from the provided link):
string data = null;
bool help = false;
int verbose = 0;
var p = new OptionSet () {
{ "file=", v => data = v },
{ "v|verbose", v => { ++verbose } },
{ "h|?|help", v => help = v != null },
};
List<string> extra = p.Parse (args);

Yes, the "magic" is that this is a common problem and it has been adequately solved. So I recommend using an already written library to handle parsing command line arguments.
CommandLineParser has been great for me. It is reasonably documented and flexible enough for every type of command line argument I've wanted to handle. Plus, it assists with usage documentation.
I will say that I'm not the biggest fan of making a specific class that has to be adorned with attributes to use this library, but it's a minor point considering that it solves my problem. And in reality forcing that attributed class pushes me to keep that class separate from where my app actually retrieves it's settings from and that always seems to be a better design.

You can use NDesk.Options.

There is no such a thing as named parameters. "-sd" is just a choice for a specific application. It can be "/sd" as well. Or "sd=". Or whatever you want.
Since there are no named parameters, there is nothing inside .NET Framework which let you use the "-sd" syntax.
But you can quite easily build your own method to get a set of "named parameters" for your app.
Edit: or, even better, you can use an existing library, like suggested in other answers.
Edit: reading the answer by #Sander Rijken, I see that I was wrong: there were still an implementation of "-sd" syntax in .NET 4.0 before the release. But since it was dropped before the final release, the only ways are still to create your own method or to use an existing library.

Regex: C# method declaration parsing

Could somebody help me parse following from the C# method declaration: scope, isStatic, name, return type and list of the parameters and their types. So given method declaration like this
public static SomeReturnType GetSomething(string param1, int param2)
etc. I need to be able to parse it and get the info above. So in this case
name = "GetSomething"
scope = "public"
isStatic = true
returnType = "SomeReturnType"
and then array of parameter type and name pairs.
Oh almost forgot the most important part. It has to account for all other scopes (protected, private, internal, protected internal), absence of "static", void return type etc.
Please note that REFLECTION is not solution here. I need REGEX.
So far I have these two:
(?:(?:public)|(?:private)|(?:protected)|(?:internal)|(?:protected internal)\s+)*
(?:(?:static)\s+)*
I guess for rest of the problem I can just get away with string manipulation without regex.

Some thoughts on your problem:
A set of strings that can all be matched by a particular regular expression is called a regular language. The set of strings which are legal method declarations is not a regular language in any version of C#. If you are attempting to find a regular expression which matches every legal C# method declaration and rejects every illegal C# method declaration then you are out of luck.
More generally, regular expressions are almost always a bad idea for anything but the simplest matching problems. (Sorry Jeff.) A far better approach is to first write a lexer, which breaks up the string into a sequence of tokens. Then analyze the token sequence. (Using regular expressions as part of a lexer is not a terrible idea, though you can get by without them.)
I note also that you are glossing over rather a lot of complications in parsing method declarations. You did not mention:
generic/array/pointer/nullable return and formal parameter types
generic type parameter declarations
generic type parameter constraints
unsafe/extern/new/override/virtual/abstract/sealed methods
explicit interface implementation methods
method/parameter/return attributes
partial methods -- slightly tricky to parse, partial is a contextual keyword
comments
I also note that you've not said whether you are guaranteed that the method signature is already good, or if you need to identify bad ones and produce diagnostics as to why they're bad. That's a much harder problem.
Why do you want to do this in the first place? Doing this correctly is rather a lot of work. Perhaps there is an easier way to get what you want?

I wouldn't bother with using Regex. When you get to the part of interpreting method parameters, it gets really messy (ref and out keywords for example). I don't know if you need support for attribute notation as well, but that would make it a complete mess.
Maybe a C# parser library can be of help. I've found a few on the internet:
http://www.codeplex.com/csparser (C# 1.0)
http://www.csharpparser.com/
Alternatively, you could first feed the code to the compiler at runtime, and then use reflection on the newly created assembly. It will be slower, but pretty much guaranteed to be correct. Even though you seem to be opposed to the idea of using reflection, this can be a viable solution.
Something like this:
List<string> referenceAssemblies = new List<string>()
{
"System.dll"
// ...
};
string source = "public abstract class TestClass {" + input + ";}";
CSharpCodeProvider codeProvider = new CSharpCodeProvider();
// No assembly name specified
CompilerParameters compilerParameters =
new CompilerParameters(referenceAssemblies.ToArray());
compilerParameters.GenerateExecutable = false;
compilerParameters.GenerateInMemory = false;
CompilerResults compilerResults = codeProvider.CompileAssemblyFromSource(
compilerParameters, source);
// Check for successful compilation here
Type testClass = compilerResults.CompiledAssembly.GetTypes().First();
Then use reflection on testClass.
Compiling should be safe without input validation, because you're not executing any of the code. You'd only need very basic checks, such as making sure only 1 method signature is entered.

Well given the rules you've provided, it would probably be best to use a series of regular expressions rather than trying to come up with a singular expression. That expression would be enormous.
If you're sold on a singular expression, you'll need to use a regular expression that uses grouping, look-ahead and look-behind.
http://www.regular-expressions.info/lookaround.html
Even with the limited scope of what you're trying to parse out of it, you'll still need some very specific guidelines on all possibilities.

string test = #"public static SomeReturnType GetSomething(string param1, int param2)";
var match = Regex.Match(test, #"(?<scope>\w+)\s+(?<static>static\s+)?(?<return>\w+)\s+(?<name>\w+)\((?<parms>[^)]+)\)");
Console.WriteLine(match.Groups["scope"].Value);
Console.WriteLine(!string.IsNullOrEmpty(match.Groups["static"].Value));
Console.WriteLine(match.Groups["return"].Value);
Console.WriteLine(match.Groups["name"].Value);
List<string> parms = match.Groups["parms"].ToString().Split(',').ToList();
parms.ForEach(x => Console.WriteLine(x));
Console.Read();
Broken for parms with commas, but it's quite possible to also handle that.

(?<StringRepresentation>\A\s*(?:(?:(?<Comment>(?://.*\n)|(?:/\*(?:[\w\d!##$%^&*()\[\]<>,.;\\"':|{}`~+=-_?\s]*)?\*/))|(\[\s*(?<Attributes>\w*)[^\[\]]*?\]))\s*)*?(?:(?:(?<Access>protected\s+internal|internal\s+protected|private|public|protected|internal)\s+)?(?:(?<InheritanceModifier>new|abstract|override|virtual)\s+)?(?:(?<Static>static)\s+)?(?:(?<Extern>extern)\s+)?(?:partial\s+)?)+(?:(?<Type>\w+(?:[\w,.\?\[\]])*?(?:\<.*>)*?)\s+)?(?<Operator>operator\s+)?\s*(?<Name>~?(?:[\w\=+\-\!\~\d\.])+?)\s*(?:\<(?:\w\.*\d*\,*\s*)+\>)*\s*\((?<Parameters>(?:[^()])*?)\)\s*(?:where\s+.+)?\s*(?:\:\s*(?:this|base)\s*(?:\(?[^\(\)]*(?:(?:(?:(?<OpenC>\()[^\(\)]*)+(?:(?<CloseC-OpenC>\))[^\(\)]*?)+)*(?(OpenC)(?!))\)))\s*)?(?:;|(?<ah>\{[^\{\}]*(?:(?:(?:(?<Open>\{)[^\{\}]*)+(?:(?<Close-Open>\})[^\{\}]*?)+)*(?(Open)(?!))\}))))
I can't personally take credit for this one, but the guy who made Regionerate (open source) came up with this and it works pretty well for parsing methods in general.

Is there a way I can dynamically define a Predicate body from a string containing the code?

This is probably a stupid question, but here goes. I would like to be able to dynamically construct a predicate < T > from a string parsed from a database VARCHAR column, or any string, for that matter. For example, say the column in the database contained the following string:
return e.SomeStringProperty.Contains("foo");
These code/string values would be stored in the database knowing what the possible properties of the generic "e" is, and knowing that they had to return a boolean. Then, in a magical, wonderful, fantasy world, the code could execute without knowing what the predicate was, like:
string predicateCode = GetCodeFromDatabase();
var allItems = new List<SomeObject>{....};
var filteredItems = allItems.FindAll(delegate(SomeObject e) { predicateCode });
or Lambda-ized:
var filteredItems = allItems.FindAll(e => [predicateCode]);
I know it can probably never be this simple, but is there a way, maybe using Reflection.Emit, to create the delegate code dynamically from text and give it to the FindAll < T > (or any other anonymous/extension) method?

The C# and VB compilers are available from within the .NET Framework:
C# CodeDom Provider
Be aware though, that this way you end up with a separate assembly (which can only be unloaded if it's in a separate AppDomain). This approach is only feasible if you can compile all the predicates you are going to need at once. Otherwise there is too much overhead involved.
System.Reflection.Emit is a great API for dynamically emitting code for the CLR. It is, however, a bit cumbersome to use and you must learn CIL.
LINQ expression trees are an easy to use back-end (compilation to CIL) but you would have to write your own parser.
I suggest you have a look at one of the "dynamic languages" that run on the CLR (or DLR) such as IronPython. It's the most efficient way to implement this feature, if you ask me.

Check out the Dynamic Linq project it does all this and more!
http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx
Great for simple stuff like user selected orderby's or where clauses

It is possible using emit, but you'd be building your own parser.
EDIT
I remember that in ScottGu's PDC keynote, he showed a feature using the CLI version of the .net framework that resembled Ruby's eval, but I can't find a URL that can corroborate this. I'm making this a commnity wiki so that anyone who has a good link can add it.

I stepped off the dynamic linq because it's limited in ways I want to search a collection, unless you prove me wrong.
My filter needs to be: in a list of orders, filter the list so that I have only the orders with in the collection of items in that order, an item with the name "coca cola".
So that will result to a method of: orders.Findall(o => o.Items.Exists(i => i.Name == "coca cola"))
In dynamic linq I didn't find any way to do that, so I started with CodeDomProvicer.
I created a new Type with a method which contains my dynamically built FindAll Method:
public static IList Filter(list, searchString)
{
// this will by dynamically built code
return orders.Findall(o => o.Items.Exists(i => i.Name == "coca cola"));
}
when I try to build this assembly:
CompilerResults results = provider.CompileAssemblyFromSource(parameters, sb.ToString());
I'm getting the error:
Invalid expression term ">"
Why isn't the compiler able to compile the predicate?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I create syntax nodes in Roslyn from scratch? - c#

Yes, the SyntaxFactory type is the way to create syntax nodes from scratch.

Related

Redundant Parentheses in Roslyn SyntaxGenerator AssignmentStatement

Find all method calls for a specific method using Roslyn

Is there .net magic to get parameter values by name in console application?

Regex: C# method declaration parsing

Is there a way I can dynamically define a Predicate body from a string containing the code?

Categories

Resources