I am researching ways, tools and techniques to parse code files in order to support syntax highlighting and intellisence in an editor written in c#.
Does anyone have any ideas/patterns & practices/tools/techiques for that.
EDIT: A nice source of info for anyone interested:
Parsing beyond Context-free grammars
ISBN 978-3-642-14845-3
My favourite parser for C# is Irony: http://irony.codeplex.com/ - i have used it a couple of times with great success
Here is a wikipedia page listing many more: http://en.wikipedia.org/wiki/Compiler-compiler
There are two basic aproaches:
1) Parse the entire solution and everything it references so you understand all the types involved in the code
2) Parse locally and do your best to guess what types etc are.
The trouble with (2) is that you have to guess, and in some circumstances you just can't tell from a code snippet exactly what everything is. But if you're happy with the sort oif syntax highlighting shown on (e.g.) Stack Overflow, then this approach is easy and quite effective.
To do (1) then you need to do one of (in decreasing order of difficulty):
Parse all the source code. Not possible if you reference 3rd party assemblies.
Use reflection on the compiled code to garner type information you can use when parsing the source.
Use the host IDE's (if avaiable - so not applicable in your case!) code element interfaces to provide the information you need
You could take a look at how http://www.icsharpcode.net/ did it. They wrote a book doing just that, Dissecting a C# Application: Inside SharpDevelop, it even has a chapter called
Implement a parser to provide syntax
highlighting and auto-completion as
users type
Related
The customer has million lines of code (VB.Net and C#), and wants us to develop a tool to estimate the quality of the code.
What the information the customer wants to know include:
1)how many lines of comments in one code file
2) how many functions implemented in one class
3) whether all possible exception has been wrapped by a try/catch block
4) how many attributes attached to one function
5) ... (the customer said that the tool we provide should be configured and extensible so that they can implement more ideas later)
We plan to write a VS.Net add-on, which can parse the code of the opening project in time. seems the interesting thing in here is that we need to parsing the code of C# and VB.Net.
Please kindly provide some tips about how to start this interesting task.
Thanks in advanced!
You ask a very broad question, but you should begin by studying existing parser's APIs.
Once you do that you're golden.
For example look at this SO question which provides some parsers for C#. Of course you could write your own but I don't find any reason to since the task isn't very easy.
So you get your AST and once you do that you have all the information you want.
Keep in mind that if you reference a type that isn't in the file you must have to get it from another one, and it could also be a type from .NET. So there is definitely more work to be done.
To go through your list:
1)how many lines of comments in one code file: You could find it through your C# parser of choice. They recognize comment aswell
2) how many functions implemented in one class: Likewise, should be very easy
3) whether all possible exception has been wrapped by a try/catch block: Likewise, just find exception throws (the parser is likely to have a special type for language keywords, so looking for throw should be easy).
4) how many attributes attached to one function: and... Likewise
5) ... (the customer said that the tool we provide should be configured and extensible so that they can implement more ideas later): Shouldn't differ from any other project. Just make sure you're using good design principles, keeping everything abstract, using interfaces wisely, make your work in layers, etc. etc...
You can use Roslyn. For C# you can also use NRefactory.
Have a look at Stylecop, you may be able to add rules get the information you want?
http://stylecop.codeplex.com/
I know it might not be worth it but just for education purposes I want to know if there is a way to inject your own keywords to .NET languages.
For example I thought it's good to have C++ asm keyword in C#.
Remember I'm not talking about how to implement asm keyword but a general way to add keyword to C#.
My imagined code :
asm{
mov ax,1
add ax,4
}
So is there a way to achieve this ?
The answers which cover implementing keyword{ } suits enough for this question.
This isn't possible at the moment. However, there's a Microsoft project in development called Roslyn that can be summarised as "the compiler as a service." It allows you, amongst other things, to extend or modify the behaviour of the compiler through an API.
When Roslyn becomes available, I believe this should be something that (with caution!) is quite doable.
You can use whatever tools you would like to pre-process your code before sending it to the C# compiler. For example, you might use VS macros to do the pre-processing, mapping a given syntax that you invented into something that does compile into C# code, possibly generating an error if there is a problem. If VS macros aren't powerful enough for you then you can always use your own IDE that does whatever you code it to do to the text before sending it to the compiler.
There is no built in support in the compiler for specifying your own keywords/syntax; you would need to handle it entirely independent of the compiler.
Unfortunately this is not possible. You can't extend or alter the languages in any way.
You could in some obscure way use PostSharp to read and parse strings and transform them to custom code at compile time (a pre processor). But you would not get very happy with that, as it is very error prone and you won't get any kind of intellisense or code completion for your magic strings.
According to MSDN keywords are predefined and cannot be altered. So you can't add any, because you would need to tell the compiler how to handle them. Insofar, no you can't.
I am asking this question, because I didn't find yet any posts that are C# related and there might be some build in methods for that I couldn't find. If there are, please tell me so and I can close this question.
Basically I have the common situation:
User types a function w.r.t. one or two variables into some TextBlock
I take this string analyse it
As a return I would like to have a delegate to a method that will take one or two inputs (the variables) and return the function value according to what the user typed in.
Now, I could probably think (and I would like to do this on my own, because I want to use my brain) of an algorithm of analysing the string step by step to actually find out, what has to be calculated first and in what way. E.g. First scan for parentheses, look for the expression within a group of parantheses and calculate that according to more general functions etc.
But in the end I would like to "create" a method of this analysis to be easily used as a normal delegate with a couple of arguments that will return the correct function value.
Are there any methods included in C# for that already, or would I have to go and program everything by myself?
As a remark: I don't want to use anybody else' library, only .NET libraries are acceptable for me.
Edit: After Matt pointed out expression trees, I found this thread which is a good example to my problem.
Edit2: The example pointed out does only include simple functions and will not be useful if I want to include more complex functions such as trigonometric ones or exponentials.
What you are describing is a parser. There are a number of different ways of implementing them, although generally speaking, for complex grammars, a "parser generator" is often used.
A parser generator will take a description of the grammar and convert it into code that will parse text that conforms to the grammar into some form of internal representation that can be manipulated by the program, e.g. a parse tree.
Since you indicate you want to avoid third-party libraries, I'll assume that the use of a parser generator is similarly excluded, which leaves you with implementing your own parser (which fortunately is quite an interesting exercise).
The Wikipedia page on Recursive descent parsers will be particularly useful. I suggest reading through it and perhaps adapting the example code therein to your particular use case. I have done this myself a number of times for different grammars with this as a starting point, so can attest to its usefulness.
The output from such a parser will be a "parse tree". And you then have a number of possibilities for how you convert this into an executable delegate. One option is to implement an Evaluate() method on your parse tree nodes, which will take a set of variables and return the result of evaluating the user's expression. As others have mentioned, your parse tree could leverage .NET's Expression trees, or you can go down the route of emitting IL directly (permitting you to produce a compiled .NET assembly from the user's expression for later use as required).
You might want to look at expression trees.
Check out NCalc for some examples of how to do this. You don't need to use the library, but reading the source is pretty educational.
I found a very helpful pdf explaining the parsing in C# 2.0. This link leads to a very good tutorial on parsers used in C# and also applies that later on to an arithmetic expression.
As this directly helps and answers to my question, I posted this as an answer, rather than as a comment or edit.
I have seen Attributes and Reflection and now i know how to create and use reflection to see meta data of attribute but is it possible to make a standalone tool that can analyse a cs file and extract attributes used ?
What am I trying to do?
Basically I am working on a tool which takes C# code as input. Next step is to see what Attributes are used in that source code. Extract Intrinsic and Custom Attributes.
Probem?
this makes sense if you are using reflection in same project in which your attributes are defined, however I do not know in what direction I should move to write a separate tool that can give you above extracted statics and meta data of attributes.
Some say I should use Regex to extract the attributes in files where as other say I need to use Irony - .NET Language Implementation Kit
Furthermore
above work will result me to have an application that will be used for attributes(annotation) based Design Pattern Recovery from Source Code. I have less idea if Regex would come to rescue or i need something like Reflection. As Reflection is deals with runtime. I do not have to deal with run time. just static files analysis
If I properly understood your problem, you really need to parse your code. Regex won't help you, as beside parsing attributes you will need to parse class hierarchy. Reflection might do the trick, but you won't be able to show to the user the results. So, the best idea is to use any parser to get an expression tree from the source, and than investigate it.
If you don't know which parser to choose - I'd recommend Rosalyn, as it should be easiest for parsing C# code (it is designed especially for it). You can find an example for parsing here:
http://blog.filipekberg.se/2011/10/20/using-roslyn-to-parse-c-code-files/
I think it should be really powerful and useful for your task
Apparently I don't have enough reputation to comment, so I'm gonna have to say this as an answer.
Reflection deals with Runtime Type Information. It is a mechanism for finding out things about a type that you the programmer don't already know about (perhaps someone else is providing you a code library, and forgot to document it). Reflection will give you any information you need about the public contract of a class, including methods, properties, fields, attributes, and interfaces/classes inherited.
What you need however is a parser. A parser is a standard programming concept that processes files and extracts specific information. You are looking for information in code files, which are not runtime types yet, which means reflection has no information on them yet, however you have your eyes, since they're still code files. In the event your eyes are not sufficient (as I suspect their not if you asked the question) you need to write a parser. Extracting specific information from a cs file is pretty simple. And the regex for an attribute is: \[.+\]
Let's say I have a WinForm App...written in C#.
Is it possible?
After all, put my eye on Iron Python.
C# is not interpreted, so unlike javascript or other interpreted languages you can't do that natively. You can go four basic routes, listed here in order of least to most complex...
1) Provide a fixed set of operations that the user can apply. Parse the user's input, or provide checkboxes or other UI elements to indicate that a given operation should be applied.
2) Provide a plugin-based or otherwise dynamically defined set of operations. Like #1, this has the advantage of not needing special permissions like full trust. MEF might come in handy for this approach: http://mef.codeplex.com/
3) Use a dynamic c# compilation framework like paxScript: http://eco148-88394.innterhost.net/paxscriptnet/. This would, in theory, allow you to compile small c# snippets on demand.
4) Use IL Emit statements to parse code and generate your operations on the fly. This is by far the most complex solution, likely requires full trust, and is extremely error prone. I don't recommend it unless you have some very obscure requirements and sophisticated users.
The CSharpCodeProvider class will do what you want. For a (VERY outdated, but still working with a few tweaks) example of its use, check out CSI.
If you are willing to consider targeting the Mono runtime, the type Mono.CSharp.Evaluator provides an API for evaluating C# expressions and statements at runtime.