Developing Abstract Syntax Tree

Developing Abstract Syntax Tree - c#

I've scoured the internet looking for some newbie information on developing a C# Abstract Syntax Trees but I can only find information for people already 'in-the-know'. I am a line-of-business application developer so topics like these are a bit over my head, but this is for my own education so I'm willing to spend the time and learn whatever concepts are necessary.
Generally, I'd like to learn about the techniques behind developing an abstract representation of code from a code string. More specifically, I'd like to be able to use this AST to do C# syntax highlighting. (I realize that syntax highlighting doesn't necessary need an AST, but this seems like a good opportunity to learn some "compiler"-level techniques.)
I apologize if this question is a bit broad, but I'm not sure how else to ask.
Thanks!

First you need to understand what parsing is, and what abstract syntax trees are. For this, you can consult Wikipedia on abstract syntax trees for a first look.
You really need to spend some time with a compiler text book to understand how abstract syntax trees are related to parsing, and can be constructed while parsing; the classic reference is Aho/Ullman/Sethi's "Compilers" book (easily found on the web). You may find the SO answer to Are there any "fun" ways to learn about Languages, Grammars, Parsing and Compilers? instructive.
Once you understand how to build an AST for a simple grammar, you can then turn your attention to something like C#. The issue here is sheer scale; it is one thing to play with a toy language with 20 grammar rules. It is another to work with grammar of several hundred or a thousand rules. Experience will small ones will make it a lot easier to understand how the big ones are put together, and how to live with them.
You probably don't want to build your own C# grammar (or implement the one from the C# standard); its quite a lot of work. You can get available tools that will hand you C# ASTs (Roslyn has already been mentioned; ANTLR has a C# parser, there are many more).
It is true that you might use an AST for syntax highlighting (although that is probably killing a gnat with a sledgehammer). What most people don't think much about (but the compiler books emphasize), is what happens after you have an AST; mostly they aren't useful by themselves. You actually need a lot more machinery to do anything interesting.
Rather than repeat this over and over (I keep seeing the same kind of questions), you can see my discussion on Life After Parsing for more details.

You should probably take a look at this talk by Phil Trelford:
Write your own compiler in 24 hours
This man is a genius, and will leave you fired up to learn about compilers. He explains it literally easily enough for a five year old to understand. The five year old in question is his son, so probably has an unfair advantage, but five is five.

Take a look at Roslyn. I think it could be what you're looking for. It gives you access to the compilers AST, among lots of other amazing things!
http://blogs.msdn.com/b/visualstudio/archive/2011/10/19/introducing-the-microsoft-roslyn-ctp.aspx
Beyond that, I suggest a textbook on compilers.

Related

Why isn't string concatenation automatically converted to StringBuilder in C#? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why is String.Concat not optimized to StringBuilder.Append?
One day I was ranting about a particular Telerik control to a friend of mine. I told him that it took several seconds to generate a controls tree, and after profiling I found out that it is using a string concatenation in a loop instead of a StringBuilder. After rewriting it worked almost instantaneously.
So my friend heard that and seemed to be surprised that the C# compiler didn't do that conversion automatically like the Java compiler does. Reading many of Eric Lippert's answers I realize that this feature didn't make it because it wasn't deemed worthy enough. But if, hypothetically, costs were small to implement it, what rationale would stop one from doing it?

But if, hypothetically, costs were small to implement it, what rationale would stop one from doing it?
It sounds like you're proposing a bit of a tautology: if there is no reason to not do X, then is there a reason to not do X? No.
I see little value in knowing the answers to hypothetical, counterfactual questions. Perhaps a better question to ask would be a question about the real world:
Are there programming languages that use this optimization?
Yes. In JScript.NET, we detect string concatenations in loops and the compiler turns them into calls to a string builder.
That might then be followed up with:
What are some of the differences between JScript .NET and C# that justify the optimization in the one language but not in the other?
A core assumption of JScript.NET is that its programmers are mostly going to be JavaScript programmers, and many of them will have already built libraries that must run in any implementation of ECMAScript. Those programmers might not know the .NET framework well, and even if they do, they might not be able to use StringBuilder without making their library code non-portable. It is also reasonable to assume that JavaScript programmers may be either novice programmers, or programmers who came to programming via their line of business rather than a course of study in computer science.
C# programmers are far more likely to know the .NET framework well, to write libraries that work with the framework, and to be experienced programmers who understand why looped string concatenation is O(n2) in the naive implementation. They need this optimization generated by the compiler less because they can just do it themselves if they deem it necessary.
In short: compiler features are about spending our budget to add value for the customer; you get more "bang for buck" adding the feature to JScript.NET than you do adding it to C#.

The C# compiler does better than that.
a + b + c is compiled to String.Concat(a, b, c), which is faster than StringBuilder.
"a" + "b" is compiled directly to "ab" (useful for multi-line literals).
The only place to use StringBuilder is when concatenating repetitively inside a loop; the compiler cannot easily optimize that.

Rules/guidelines for documenting C# code? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am a relatively new developer and have been assigned the task of documenting code written by an advanced C# developer. My boss told me to look through it, and to document it so that it would be easier to modify and update as needed.
My question is: Is there a standard type of Documentation/Comment structure I should follow? My boss made it sound like everyone knew exactly how to document the code to a certain standard so that anyone could understand it.
I am also curious if anyone has a good method for figuring out unfamiliar code or function uncertainty. Any help would be greatly appreciated.

The standard seems to be XML Doc (MSDN Technet article here).
You can use /// at the beginning of each line of documentation comments. There are standard XML style elements for documenting your code; each should follow the standard <element>Content</element> usage. Here are some of the elements:
<c> Used to differentiate code font from normal text
<c>class Foo</c>
<code>
<example>
<exception>
<para> Used to control formatting of documentation output.
<para>The <c>Foo</c> class...</para>
<param>
<paramref> Used to refer to a previously described <param>
If <paramref name="myFoo" /> is <c>null</c> the method will...
<remarks>
<returns>
<see> Creates a cross-ref to another topic.
The <see cref="System.String" /><paramref name="someString"/>
represents...
<summary> A description (summary) of the code you're documenting.

Sounds like you really did end up getting the short straw.
Unfortunately I think you've stumbled on one of the more controversial subjects of software development in general. Comments can be seen as extremely helpful where necessary, and unnecessary cruft when used wrongly. You'll have to be careful and decide quite carefully what goes where.
As far as commenting practice, it's usually down to the corporation or the developer. A few common rules I like to use are:
Comment logic that isn't clear (and consider a refactor)
Only Xml-Doc methods / properties that could be questioned (or, if you need to give a more detailed overview)
If your comments exceed the length of the containing method / class, you might want to think about comment verbosity, or even consider a refactor.
Try and imagine a new developer coming across this code. What questions would they ask?
It sounds like your boss is referring to commenting logic (most probably so that you can start understanding it) and using xml-doc comments.
If you haven't used xml-doc comments before, check out this link which should give you a little guidance on use and where appropriate.
If your workloadi s looking a little heavy (ie, lots of code to comment), I have some good news for you - there's an excellent plugin for Visual Studio that may help you out for xml-doc comments. GhostDoc can make xml-doc commenting methods / classes etc much easier (but remember to change the default placeholder text it inserts in there!)
Remember, you may want to check with your boss on just what parts of the code he wants documented before you go on a ghostdoc spree.

It's a bit of a worry that the original programmer didn't bother to do one of the most important parts of his job. However, there are lots of terrible "good" programmers out there, so this isn't really all that unusual.
However, getting you to document the code is also a pretty good training mechanism - you have to read and understand the code before you can write down what it does, and as well as gaining knowledge of the systems, you will undoubtedly pick up a few tips and tricks from the good (and bad!) things the other programmer has done.
To help get your documentation done quickly and consistently, you might like to try my add-in for Visual Studio, AtomineerUtils Pro Documentation. This will help with the boring grunt work of creating and updating the comments, make sure the comments are fully formed and in sync with the code, and let you concentrate on the code itself.
As to working out how the code works...
Hopefully the class, method, parameter and variable names will be descriptive. This should give you a pretty good starting point. You can then take one method or class at a time and determine if you believe that the code within it delivers what you think the naming implies. If there are unit tests then these will give a good indication of what the programmer expected the code to do (or handle). Regardless, try to write some (more) unit tests for the code, because thinking of special cases that might break the code, and working out why the code fails some of your tests, will give you a good understanding of what it does and how it does it. Then you can augment the basic documentation you've written with the more useful details (can this parameter be null? what range of values is legal? What will the return value be if you pass in a blank string? etc)
This can be daunting, but if you start with the little classes and methods first (e.g. that Name property that just returns a name string) you will gain familiarity with the surrounding code and be able to gradually work your way up to the more complex classes and methods.
Once you have basic code documentation written for the classes, you should then be in a position to write external overview documentation that describes how the system as a whole functions. And then you'll be ready to work on that part of the codebase because you'll understand how it all fits together.
I'd recommend using XML documentation (see the other answers) as this is immediately picked up by Visual Studio and used for intellisense help. Then anyone writing code that calls your classes will get help in tooltips as they type the code. This is such a major bonus when working with a team or a large codebase, but many companies/programmers just don't realise what they've been missing, banging their (undocumented) rocks together in the dark ages :-)

I suspect your boss is referring to the following XML Documentation Comments.
XML Documentation Comments (C# Programming Guide)

It might be worth asking your boss if he has any examples of code that is already documented so you can see first-hand what he is after.
Mark Needham has written a few blog posts about reading/documenting code (see Archive for the ‘Reading Code’ Category.
I remember reading Reading Code: Rhino Mocks some time ago that talks about diagramming the code to help keep track of where you are and to 'map out' what's going on.
Hope that helps - good luck!

Looking for a few good C# interview problems [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I do not want to ask candidates questions, but rather give them several problems to resolve. The reason for this is that I've seen people be excellent with theory, but when confronted by a real world c# issue, just couldn't hack it.
These c# problems should be simple enough that it won't take more than 1-20 minutes to resolve, yet complicated enough that I'd be able to weed out candidates that can't code.
Right now, I typically ask the applicants to reverse a string and remove duplicates from a List. This alone weeds out a large number of people.
Any other examples I could use?
Edit: I should have mentioned that this is for a standard c# gig, where they'll be writing business code rather than finding the most optimal way to implement a linked list.

I like picking simple problems that I actually had to solve at some point; it doesn't get more relevant to the job than that.
When I worked on VBScript I'd ask college candidates how to write a simplified version of DateDiff, since doing so was what I did my first real day of work at Microsoft. More advanced candidates I would ask how to build a device which tracks the relationship between 32 bit handles and an associated 64 bit pointer, which again I actually had to do when working on VBScript.
More recently I tend to ask questions about tree manipulation algorithms, since the compiler is all about tree manipulation. Or about how to codegen new operators using monads, since that's how LINQ works.
My point is not that you should use questions in these areas, my point is that surely you must have had problems that you had to solve in your day-to-day work. Ask the candidates about those problems -- then you'll learn how they solve a realistic problem, and they'll learn what sorts of problems they'd be solving if they came to work with you.

dont ask for knowledge of class libraries or obscure corners of the language (unsafe, dynamic, ..); smart people can pick these up or look them up.
I would ask to design a class hierarchy to represent something real world (vehicles, animals, ...). This usually flushes out the people who dont get objects. Make them do it with interfaces too. Also make them reverse a string - no harm in oldies but goldies

I agree with you, it is surprising how many people claim to be experienced and you find out that all that they did was read the box…
I don’t know if testing for C# is as valuable as it first seems… sure you could ask them to describe an example of when they needed to use inheritance, or why casting might have a performance problem, etc. But these are easy to study for. You would be surprised at how many interviewees give the example using “car” or “color” when giving their real world example of inheritance…. Guess they are in a book somewhere.
When looking at this problem it helps me when I compare experience in development to learning Spanish. A short time into the class everyone is conjugating verbs and can pass a test on this… but nobody speaks Spanish yet. You want the guy that claims to speak Spanish and can actually do it.
So I like to be more specific with the other technologies that will tell me if they have traveled the well-worn path of development. If they say they are an ASP.Net developer I ask them simple questions, but ones that are on the path
EXAMPLES: Give me an example of where the connection string could live? If you need to pass an ID from one page to another, what are your options? If a page takes 5 minutes to load, tell me how you would go about troubleshooting it. If I had a web page that had a single button on it, how would I center that button? Tell me the difference between storing variables in the viewstate verses session state?
You don’t have to know everything, but eighty percent of the people interviewing for a senior level position will get 10% of these types of questions right. (And on 70% of the phone interviews you will hear them Googling for the answers – good thing these aren’t the types of questions you can easily Google for.)
SQL Server is about the same. They say they would rate themselves an 8 or 9 in SQL Sever development, but then get 10% of questions. The questions again are to see if you have been on the well-worn path.
EXAMPLES: If you had a table of customers and a table of orders, how would you find the customers that had no orders? What is a clustered index? If I had a table of developers and a table of projects, how would I set it up so that projects could have multiple developers on it and developers could be on multiple projects?
How could you develop in SQL Server for “years” and not have hit these concepts? A high percentage of candidates get almost none of these answers right!! (I guess the SQL Server box isn’t as informative.)
So if you say you are a senior level guy and you can say “Soy un revelador de software” (I am a software developer), but can’t say “He hecho eso antes” (I have done that before), I don’t think you are the senior level person you are claiming to be.
Now this tells you if they have been on the well-worn path, but not if they are smart and have good problem solving skills. Having gone thru a ton of these types of interviews I can tell you that by the time the process is done you will be satisfied with having enough information to have a strong opinion on both of these issues. You might also see that by then giving them a problem set to solve is unnecessary.

Show them a small section of code or architecture diagram from one of your own projects and ask them to suggest how they would refactor it. Even if you don't wind up hiring them, you might get some interesting suggestions on ways to improve your code.

Building Eric's and other answers here, but answering as an only-ever-so-far-interviewee, what I would like in an interview is a kind of pair-programming 'test', where you sit down together facing the screen, and talk through a real-world problem.
I think there would be many advantages:
For the interviewee, being in front of a screen instead of facing the interviewer makes it easier to think about the problem rather than the interview.
For the interviewer, being with the interviewee while they look through the code and ask questions about the problem space would give a much greater insight into how the interviewee thinks, how they approach problems, and how they communicate and interact with others.
I would expect that it's more important and interesting to see a candidate thinking round the edges of your real-world problem, even if they don't completely solve it, than to have them get 10 out of 10 on come algorithmic test.

Something mildly algorithmic.
Write a method that returns true if a string is a palindrome, and false otherwise.
Re-implement the String.Substring(int, int) method.
Something about object-oriented design too.
Design a checkers game (ie, define the classes and some of the methods).

One question I was asked, and subsequently ask interviewees, is"Describe how you would make this phone into an application". Have them describe the classes, their properties, methods, interfaces, etc. Then question them on why they chose to implement them in that specific way. It gives you a good idea if they understand how to code, and gives you some insight into how they approach and solve problems.
Also, if you offer a suggestion of how they could have implemented it a different way, it may show you whether they are open to new ideas, criticism, or if they are a team player or not.

Fizz Buzz

Patterns/Practices for encapsulating predicates

I'm guessing most of us have to deal with this at some point so I thought I'd ask the question.
When you have a lot of collections in your BLL and you find that you're writing the same old inline (anonymous) predicates over and over then there's clearly a case for encapsulation there but what's the best way to achieve that?
The project I'm currently working on takes the age old, answer all, static class approach (E.g User class and static UserPredicates class) but that seems somewhat heavy-handed and a little bit of a cop out.
I'm working in C# mostly so keeping in that context would be most helpful but i think this is generic enough a question to warrant hearing about other languages.
Also I expect there will be a difference in how this might be achieved with the advent of LINQ and Lambdas so I'd be interested in hearing how this could be done in both .Net2.0 and 3.0/3.5 styles.
Thanks in advance.

Specification pattern might be worth checking out.
With some polymorphism & usage of generics it should work.

A Predicate is essentially just an implementation of the Specification design pattern. You can read about the Specification pattern in Domain-Driven Design.

What's the best way of parsing strings? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
We've got a scenario that requires us to parse lots of e-mail (plain text), each e-mail 'type' is the result of a script being run against various platforms. Some are tab delimited, some are space delimited, some we simply don't know yet.
We'll need to support more 'formats' in the future too.
Do we go for a solution using:
Regex
Simply string searching (using string.IndexOf etc)
Lex/ Yacc
Other
The overall solution will be developed in C# 2.0 (hopefully 3.5)

Regex.
Regex can solve almost everything except for world peace. Well maybe world peace too.

The three solutions you stated each cover very different needs.
Manual parsing (simple text search) is the most flexible and the most adaptable, however, it very quickly becomes a real pain in the ass as the parsing required is more complicated.
Regex are a middle ground, and probably your best bet here. They are powerful, yet flexible as you can yourself add more logic from the code that call the different regex. The main drawback would be speed here.
Lex/Yacc is really only adapted to very complicated, predictable syntaxes and lacks a lot of post compile flexibility. You can't easily change parser in mid parsing, well actually you can but it's just too heavy and you'd be better using regex instead.
I know this is a cliché answer, it all really comes down to what your exact needs are, but from what you said, I would personally probably go with a bag of regex.
As an alternative, as Vaibhav poionted out, if you have several different situations that can arise and that you cna easily detect which one is coming, you could make a plugin system that chooses the right algorithm, and those algorithms could all be very different, one using Lex/Yacc in pointy cases and the other using IndexOf and regex for simpler cases.

You probably should have a pluggable system regardless of which type of string parsing you use. So, this system calls upon the right 'plugin' depending on the type of email to parse it.

You must architect your solution to be updatable, so that you can handle unknown situations when they crop up. Create an interface for parsers that contains not only methods for parsing the emails and returning results in a standard format, but also for examining the email to determine if the parser will execute.
Within your configuration, identify the type of parser you wish to use, set its configuration options, and the configuration for the identifiers which determine if a parser will act or not. Name the parsers by assembly qualified name so that the types can be instantiated at runtime even if there aren't static links to their assemblies.
Identifiers can implement an interface as well, so you can create different types that check for different things. For instance, you might create a regex identifier, which parses the email for a specific pattern. Make sure to make as much information available to the identifier, so that it can make decisions on things like from addresses as well as the content of the email.
When your known parsers can't handle a job, create a new DLL with types that implement the parser and identifier interfaces that can handle the job and drop them in your bin directory.

It depends on what you're parsing. For anything beyond what Regex can handle, I've been using ANTLR. Before you jump into recursive descent parsing for the first time, I would research how they work, before attempting to use a framework like this one. If you subscribe to MSDN Magazine, check the Feb 2008 issue where they have an article on writing one from scratch.
Once you get the understanding, learning ANTLR will be a ton easier. There are other frameworks out there, but ANTLR seems to have the most community support and public documentation. The author has also published The Definitive ANTLR Reference: Building Domain-Specific Languages.

Regex would probably be you bes bet, tried and proven. Plus a regular expression can be compiled.

Your best bet is RegEx because it provides a much greater degree of flexibility than any of the other options.
While you could use IndexOf to handle somethings, you may quickly find yourself writing code that looks like:
if(s.IndexOf("search1")>-1 || s.IndexOf("search2")>-1 ||...
That can be handled in one RegEx statement. Plus, there are a lot of place like RegExLib.com where you can find folks who have shared regular expressions to solve problems.

#Coincoin has covered the bases; I just want to add that with regex it's particularly easy to end up with hard-to-read, hard-to-maintain code. Regex is a powerful and very compact language, so that's how it often goes.
Using whitespace and comments within the regex can go a long way to make it easier to maintain regexes. Eric Gunnerson turned me on to this idea. Here's an example.

Use PCRE. All other answers are just 2nd Best.

With as little information you provided, i would choose Regex.
But what kind of information you want to parse and what you would want to do will change the decision to Lex/Yacc maybe..
But it looks like you've already made your mind up with String search :)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.