Reflection or Regex on Custom attributes

Reflection or Regex on Custom attributes - c#

I have seen Attributes and Reflection and now i know how to create and use reflection to see meta data of attribute but is it possible to make a standalone tool that can analyse a cs file and extract attributes used ?
What am I trying to do?
Basically I am working on a tool which takes C# code as input. Next step is to see what Attributes are used in that source code. Extract Intrinsic and Custom Attributes.
Probem?
this makes sense if you are using reflection in same project in which your attributes are defined, however I do not know in what direction I should move to write a separate tool that can give you above extracted statics and meta data of attributes.
Some say I should use Regex to extract the attributes in files where as other say I need to use Irony - .NET Language Implementation Kit
Furthermore
above work will result me to have an application that will be used for attributes(annotation) based Design Pattern Recovery from Source Code. I have less idea if Regex would come to rescue or i need something like Reflection. As Reflection is deals with runtime. I do not have to deal with run time. just static files analysis

If I properly understood your problem, you really need to parse your code. Regex won't help you, as beside parsing attributes you will need to parse class hierarchy. Reflection might do the trick, but you won't be able to show to the user the results. So, the best idea is to use any parser to get an expression tree from the source, and than investigate it.
If you don't know which parser to choose - I'd recommend Rosalyn, as it should be easiest for parsing C# code (it is designed especially for it). You can find an example for parsing here:
http://blog.filipekberg.se/2011/10/20/using-roslyn-to-parse-c-code-files/
I think it should be really powerful and useful for your task

Apparently I don't have enough reputation to comment, so I'm gonna have to say this as an answer.
Reflection deals with Runtime Type Information. It is a mechanism for finding out things about a type that you the programmer don't already know about (perhaps someone else is providing you a code library, and forgot to document it). Reflection will give you any information you need about the public contract of a class, including methods, properties, fields, attributes, and interfaces/classes inherited.
What you need however is a parser. A parser is a standard programming concept that processes files and extracts specific information. You are looking for information in code files, which are not runtime types yet, which means reflection has no information on them yet, however you have your eyes, since they're still code files. In the event your eyes are not sufficient (as I suspect their not if you asked the question) you need to write a parser. Extracting specific information from a cs file is pretty simple. And the regex for an attribute is: \[.+\]

Related

Why we need Reflection at all?

I was studying Reflection, I got some of it but I am not getting everything related to this concept. Why do we need Reflection? What things we couldn't achieve that we need Reflection?

There are many, many scenarios that reflection enables, but I group them primarily into two buckets.
Reflection enables us to write code that analyzes other code.
Consider for example the most basic question about an assembly: what types are in it? Assemblies are self-describing and reflection is the mechanism by which that description is surfaced to other code.
Suppose for example you wanted to write a program which took an assembly and did a graphical display of the relationships between the various classes in that assembly, to help you understand that code. There are such tools. They're in Visual Studio. Someone wrote those tools. They did not appear by magic. Reflection is the mechanism designed into the .NET framework that enables you or me or anyone else to write tools that understand code.
Reflection enables us to move compile time bindings to runtime.
Suppose you have a static method Foo.Bar(). When you put a call to Foo.Bar() in your program, you know with 100% certainty that the method you think is going to be called is actually going to be called. We call static methods "static" because the binding from the name Bar to the code that gets called can be understood statically -- that is, without running the program.
Now consider a virtual method Blah() on a base class. When you call whatever.Blah() you don't know exactly which Blah() will be called at compile time, but you know that some method Blah() with no arguments will be called on some type that is the runtime type of whatever, and that type is equal to or derived from the type which declares Blah(). (In fact you know more: you know that it is equal to or derived from the compile time type of whatever.) Virtual binding is a form of dynamic binding, but it is not fully dynamic. There's no way for the user to decide that this call should be to a different method on a different type hierarchy.
Reflection enables us to make calls that are bound entirely at runtime, based entirely on user choices if we like. We pay a performance penalty, and we lose compile-time type safety, but we gain the flexibility to decide 100% at runtime what code we call. There are scenarios where that's a reasonable tradeoff.

Reflection is such a deep part of the .NET framework that you often don't know that you're doing it (see Attributes and LINQ for instance). And when you do know you're doing it, even if it feels wrong, it might be the only way to achieve a particular objective.
Apart from the two broad areas that Eric mentioned here are a few others. There are lots more, these are just some that come to mind immediately.
Serialization (and similar)
Whether you're using XML or JSON or rolling your own, serializing objects is much easier when you don't have to write specific code for each class to enable serialization. Reflection enables you to enumerate the properties in your object that have been flagged for (or not flagged against) serailization and write them to the output.
This isn't about saving state though. Reflection allows us to write generic methods that can produce business output too, like CSV or XLSX files from an arbitrary collection. I get a lot of mileage out of my ToCSV(...) and ToExcel(...) extensions for things like producing downloadable versions of data sets on my web-based reporting.
Accessing Hidden Data
Yes, I know, this is a dodgy one. And yeah, Eric is probably going to slap me for this, but...
There's a lot of code out there - I'm looking at you, ASP.NET - that hides interesting and useful stuff behind private or protected. Sometimes the only way to get them out is to use reflection. Sometimes it's not the only way, but it can be the simpler way.
Attributes
Every time you tag an Attribute onto one of your classes, methods, etc. you are implicitly providing data that is going to be accessed through reflection. Want to use those attributes yourself? Reflection is the only way you can get at them.
LINQ and Other Expressions
This is really important stuff these days. If you've ever used LINQ to SQL, Entity Frameworks, etc. then you've used Expression in some way. You write a simple little POCO to represent a row in your database table and everything else gets handled by reflection. When you write a predicate expression the system is using the reflection model to build structures that are then processed (visited) to build an SQL statement.
Expressions aren't just for LINQ either, you do some really interesting things yourself, once you know what you're doing. I have code to generate line parsers for CSV import that run pretty damn quickly when compiled to Func<string, TRecord>. These days I tend to use a mapper somebody else wrote, but at the time I needed to slice a few more % off the total import time for a file with 20K records that was uploaded to a website periodically.
P/Invoke Marshalling
This one is a big deal behind the scenes and occasionally in the foreground too. When you want to call a Windows API function or use a native DLL, P/Invoke gives you ways to achieve this without having to mess about with building memory buffers in both directions. The marshalling methods use reflection to do translation of certain things - strings and so on being the obvious example - so that you don't have to get your hands dirty. All based on the Type object that is the foundation of reflection.
Fact is, without reflection the .NET framework wouldn't be what it is. No Attributes, no Expressions, probably a lot less interop between the languages. No automatic marshalling. No LINQ... at least in the way we often use it now.

Parsing C# and VB.Net code to automate the code review process

The customer has million lines of code (VB.Net and C#), and wants us to develop a tool to estimate the quality of the code.
What the information the customer wants to know include:
1）how many lines of comments in one code file
2) how many functions implemented in one class
3) whether all possible exception has been wrapped by a try/catch block
4) how many attributes attached to one function
5) ... (the customer said that the tool we provide should be configured and extensible so that they can implement more ideas later)
We plan to write a VS.Net add-on, which can parse the code of the opening project in time. seems the interesting thing in here is that we need to parsing the code of C# and VB.Net.
Please kindly provide some tips about how to start this interesting task.
Thanks in advanced!

You ask a very broad question, but you should begin by studying existing parser's APIs.
Once you do that you're golden.
For example look at this SO question which provides some parsers for C#. Of course you could write your own but I don't find any reason to since the task isn't very easy.
So you get your AST and once you do that you have all the information you want.
Keep in mind that if you reference a type that isn't in the file you must have to get it from another one, and it could also be a type from .NET. So there is definitely more work to be done.
To go through your list:
1）how many lines of comments in one code file: You could find it through your C# parser of choice. They recognize comment aswell
2) how many functions implemented in one class: Likewise, should be very easy
3) whether all possible exception has been wrapped by a try/catch block: Likewise, just find exception throws (the parser is likely to have a special type for language keywords, so looking for throw should be easy).
4) how many attributes attached to one function: and... Likewise
5) ... (the customer said that the tool we provide should be configured and extensible so that they can implement more ideas later): Shouldn't differ from any other project. Just make sure you're using good design principles, keeping everything abstract, using interfaces wisely, make your work in layers, etc. etc...

You can use Roslyn. For C# you can also use NRefactory.

Have a look at Stylecop, you may be able to add rules get the information you want?
http://stylecop.codeplex.com/

Creating mathematical function in runtime from string

I am asking this question, because I didn't find yet any posts that are C# related and there might be some build in methods for that I couldn't find. If there are, please tell me so and I can close this question.
Basically I have the common situation:
User types a function w.r.t. one or two variables into some TextBlock
I take this string analyse it
As a return I would like to have a delegate to a method that will take one or two inputs (the variables) and return the function value according to what the user typed in.
Now, I could probably think (and I would like to do this on my own, because I want to use my brain) of an algorithm of analysing the string step by step to actually find out, what has to be calculated first and in what way. E.g. First scan for parentheses, look for the expression within a group of parantheses and calculate that according to more general functions etc.
But in the end I would like to "create" a method of this analysis to be easily used as a normal delegate with a couple of arguments that will return the correct function value.
Are there any methods included in C# for that already, or would I have to go and program everything by myself?
As a remark: I don't want to use anybody else' library, only .NET libraries are acceptable for me.
Edit: After Matt pointed out expression trees, I found this thread which is a good example to my problem.
Edit2: The example pointed out does only include simple functions and will not be useful if I want to include more complex functions such as trigonometric ones or exponentials.

What you are describing is a parser. There are a number of different ways of implementing them, although generally speaking, for complex grammars, a "parser generator" is often used.
A parser generator will take a description of the grammar and convert it into code that will parse text that conforms to the grammar into some form of internal representation that can be manipulated by the program, e.g. a parse tree.
Since you indicate you want to avoid third-party libraries, I'll assume that the use of a parser generator is similarly excluded, which leaves you with implementing your own parser (which fortunately is quite an interesting exercise).
The Wikipedia page on Recursive descent parsers will be particularly useful. I suggest reading through it and perhaps adapting the example code therein to your particular use case. I have done this myself a number of times for different grammars with this as a starting point, so can attest to its usefulness.
The output from such a parser will be a "parse tree". And you then have a number of possibilities for how you convert this into an executable delegate. One option is to implement an Evaluate() method on your parse tree nodes, which will take a set of variables and return the result of evaluating the user's expression. As others have mentioned, your parse tree could leverage .NET's Expression trees, or you can go down the route of emitting IL directly (permitting you to produce a compiled .NET assembly from the user's expression for later use as required).

You might want to look at expression trees.

Check out NCalc for some examples of how to do this. You don't need to use the library, but reading the source is pretty educational.

I found a very helpful pdf explaining the parsing in C# 2.0. This link leads to a very good tutorial on parsers used in C# and also applies that later on to an arithmetic expression.
As this directly helps and answers to my question, I posted this as an answer, rather than as a comment or edit.

Is there an easy way to work with application's variables by their names (from text file)?

Can I somehow use a text file with contents like this:
<comment> This is something like an XML file.
<action var="myInteger"> +50
<condition> stringVar == "sometext" <action var="boolVar"> = true
.. parse it and make my app perform some actions?
The idea is to make a user-friendly (example doesn't count) pseudocode that can change app's variables and run methods. Problem is I don't know how to change variables by their names.
Making a separate case for each variable name (explicitly supporting them) would be rather crazy:
switch(varName)
{
case "var1": {/* things */ break;}
case "var2": {/* things */ break;}
/* ... */
case "var9999": {/* things */ break;}
}
Edit: I think I asked the wrong question originally. (And it was Is there an easy way to work with application's variables by executing code from text file?)

In answer to your question: yes.
Now that you've edited your question...
You'll want to parse the XML, preferably using the libraries that ship with .NET. Then you walk the XML tree, executing each node that has some action associated with it.
You probably don't want to expose your app's variables directly. Instead, you should define some execution state that can be manipulated by the XML file. You could have, for example, a dictionary of variables and their values. Then when you get an <action> tag, you look at the var attribute, look up the variable in the dictionary, then change the value to be whatever the contents of the tag specify.
This is not a simple task. It's not necessarily hard to write a language interpreter (which is what you are doing, essentially). But it can be difficult to design your language so that it makes sense. You'll also find that, if you have embedded expressions (which you appear to), then you'll need an expression parser. Again, these are "easy" to construct, but for someone without experience, you'll need to do some research first. You could easily end up constructing something that's very complicated, slow and broken if you don't know about real world parsing techniques.
For expression parsing, look into LL(1) parsers, specifically recursive descent, which is the easiest to understand and implement.
For evaluating the XML input, you'll need a recursive algorithm that walks the tree. This will be similar to your recursive descent parser. In fact, the two are pretty much the same except for the details.
Once you've gotten something going, you should ask an actual question about a particular problem, instead of asking one so broad.
Another edit: use a dictionary for your variables.

For a very simple case where you have a small and fixed set of keyword/syntax for defining the action, I will recommend that you just write a custom parser and use Reflection to access the field/property that you are targeting and custom code on the actual operation. You can leverage on the use of Action<> and Fun<> delegates to isolate/reuse the code implementation for action with clear evaluation path.
But if you are looking at a more complex scenario, then I will recommend that you start looking into DSL. My knowledge in DSL is very limited so I can't say much, but certainly there is a fair bit of learning curve involved. Something like meta# might be a good starting point to abstract away some of the complexity

Sounds like a custom config section will help you. Take a look at http://haacked.com/archive/2007/03/11/custom-configuration-sections-in-3-easy-steps.aspx

I think this screams for application scripting, but binding all the app vars to script is a lot of work (not mentioning learning a new language).
So the best is to use .NET Runtime Compilation feature, then your xml-like file whould be a simple C# file that will be loaded, compiled and executed at runtime and better yet you can make it reference all your app vars very easily.

Code parsing C#

I am researching ways, tools and techniques to parse code files in order to support syntax highlighting and intellisence in an editor written in c#.
Does anyone have any ideas/patterns & practices/tools/techiques for that.
EDIT: A nice source of info for anyone interested:
Parsing beyond Context-free grammars
ISBN 978-3-642-14845-3

My favourite parser for C# is Irony: http://irony.codeplex.com/ - i have used it a couple of times with great success
Here is a wikipedia page listing many more: http://en.wikipedia.org/wiki/Compiler-compiler

There are two basic aproaches:
1) Parse the entire solution and everything it references so you understand all the types involved in the code
2) Parse locally and do your best to guess what types etc are.
The trouble with (2) is that you have to guess, and in some circumstances you just can't tell from a code snippet exactly what everything is. But if you're happy with the sort oif syntax highlighting shown on (e.g.) Stack Overflow, then this approach is easy and quite effective.
To do (1) then you need to do one of (in decreasing order of difficulty):
Parse all the source code. Not possible if you reference 3rd party assemblies.
Use reflection on the compiled code to garner type information you can use when parsing the source.
Use the host IDE's (if avaiable - so not applicable in your case!) code element interfaces to provide the information you need

You could take a look at how http://www.icsharpcode.net/ did it. They wrote a book doing just that, Dissecting a C# Application: Inside SharpDevelop, it even has a chapter called
Implement a parser to provide syntax
highlighting and auto-completion as
users type

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.