Rules/guidelines for documenting C# code? [closed] - c#

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I am a relatively new developer and have been assigned the task of documenting code written by an advanced C# developer. My boss told me to look through it, and to document it so that it would be easier to modify and update as needed.
My question is: Is there a standard type of Documentation/Comment structure I should follow? My boss made it sound like everyone knew exactly how to document the code to a certain standard so that anyone could understand it.
I am also curious if anyone has a good method for figuring out unfamiliar code or function uncertainty. Any help would be greatly appreciated.

The standard seems to be XML Doc (MSDN Technet article here).
You can use /// at the beginning of each line of documentation comments. There are standard XML style elements for documenting your code; each should follow the standard <element>Content</element> usage. Here are some of the elements:
<c> Used to differentiate code font from normal text
<c>class Foo</c>
<code>
<example>
<exception>
<para> Used to control formatting of documentation output.
<para>The <c>Foo</c> class...</para>
<param>
<paramref> Used to refer to a previously described <param>
If <paramref name="myFoo" /> is <c>null</c> the method will...
<remarks>
<returns>
<see> Creates a cross-ref to another topic.
The <see cref="System.String" /><paramref name="someString"/>
represents...
<summary> A description (summary) of the code you're documenting.

Sounds like you really did end up getting the short straw.
Unfortunately I think you've stumbled on one of the more controversial subjects of software development in general. Comments can be seen as extremely helpful where necessary, and unnecessary cruft when used wrongly. You'll have to be careful and decide quite carefully what goes where.
As far as commenting practice, it's usually down to the corporation or the developer. A few common rules I like to use are:
Comment logic that isn't clear (and consider a refactor)
Only Xml-Doc methods / properties that could be questioned (or, if you need to give a more detailed overview)
If your comments exceed the length of the containing method / class, you might want to think about comment verbosity, or even consider a refactor.
Try and imagine a new developer coming across this code. What questions would they ask?
It sounds like your boss is referring to commenting logic (most probably so that you can start understanding it) and using xml-doc comments.
If you haven't used xml-doc comments before, check out this link which should give you a little guidance on use and where appropriate.
If your workloadi s looking a little heavy (ie, lots of code to comment), I have some good news for you - there's an excellent plugin for Visual Studio that may help you out for xml-doc comments. GhostDoc can make xml-doc commenting methods / classes etc much easier (but remember to change the default placeholder text it inserts in there!)
Remember, you may want to check with your boss on just what parts of the code he wants documented before you go on a ghostdoc spree.

It's a bit of a worry that the original programmer didn't bother to do one of the most important parts of his job. However, there are lots of terrible "good" programmers out there, so this isn't really all that unusual.
However, getting you to document the code is also a pretty good training mechanism - you have to read and understand the code before you can write down what it does, and as well as gaining knowledge of the systems, you will undoubtedly pick up a few tips and tricks from the good (and bad!) things the other programmer has done.
To help get your documentation done quickly and consistently, you might like to try my add-in for Visual Studio, AtomineerUtils Pro Documentation. This will help with the boring grunt work of creating and updating the comments, make sure the comments are fully formed and in sync with the code, and let you concentrate on the code itself.
As to working out how the code works...
Hopefully the class, method, parameter and variable names will be descriptive. This should give you a pretty good starting point. You can then take one method or class at a time and determine if you believe that the code within it delivers what you think the naming implies. If there are unit tests then these will give a good indication of what the programmer expected the code to do (or handle). Regardless, try to write some (more) unit tests for the code, because thinking of special cases that might break the code, and working out why the code fails some of your tests, will give you a good understanding of what it does and how it does it. Then you can augment the basic documentation you've written with the more useful details (can this parameter be null? what range of values is legal? What will the return value be if you pass in a blank string? etc)
This can be daunting, but if you start with the little classes and methods first (e.g. that Name property that just returns a name string) you will gain familiarity with the surrounding code and be able to gradually work your way up to the more complex classes and methods.
Once you have basic code documentation written for the classes, you should then be in a position to write external overview documentation that describes how the system as a whole functions. And then you'll be ready to work on that part of the codebase because you'll understand how it all fits together.
I'd recommend using XML documentation (see the other answers) as this is immediately picked up by Visual Studio and used for intellisense help. Then anyone writing code that calls your classes will get help in tooltips as they type the code. This is such a major bonus when working with a team or a large codebase, but many companies/programmers just don't realise what they've been missing, banging their (undocumented) rocks together in the dark ages :-)

I suspect your boss is referring to the following XML Documentation Comments.
XML Documentation Comments (C# Programming Guide)

It might be worth asking your boss if he has any examples of code that is already documented so you can see first-hand what he is after.
Mark Needham has written a few blog posts about reading/documenting code (see Archive for the ‘Reading Code’ Category.
I remember reading Reading Code: Rhino Mocks some time ago that talks about diagramming the code to help keep track of where you are and to 'map out' what's going on.
Hope that helps - good luck!

Related

Is a sudden occurence of an unbearable desire to write some comments, a symptom of a bad design? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
Imagine... You have written some code; implemented some interfaces, used some design patterns, passed many unit tests, extracted many methods for sake of refactoring and so. You developed, developed, developed... But at some mysterious point, for some reason you, something inside you started to whisper you that you should write a comment now. Something pokes you that if you don't write a comment something will be missing or something can be not exactly clear...
Then you comment your code! :(
Can this be a secret sign of a bad design?
If you find yourself in a situation that if you don't comment your code at that spiritual(!) moment, it won't have its mission completed; can we consider it as a reliable symptom of a possible bad design in your code?
If you're feeling like you should add comments because something is missing (documentation for clients perhaps!), then it may not be an indication that your code itself is bad. If you're adding comments because something is unclear, you need to decide whether it's unclear because the code is hard or because the underlying domain is hard. If it's the code, you need to rewrite it to make it clearer. If it's the domain that's making it difficult, feel free to comment it extensively for people who aren't as familiar with the domain in question, and leave the code alone.
While I agree that the code should be as self documenting as possible, this is not always enough. There is only so much you can do without writing comment. A good example is, as one of my mentor said : "Code can only document what you do, not what you dont do".
Even if you have the best possible design, there will always be places where you will need comments to make your intentions clear.
Comments can be usefull to document a tricky technical problem. If a problem is complex, its implementation will be complex and will need documenting.
Comments are also usefull to document business rules. Those rules will be expressed into code, but the code will rarely explain the reasons behind the rules. Comments will make things clearer ...
Sometimes writing comments is easier than cleanup your code to make it clear enough the understand what it does and why. I usually do the later, but comments can be the best approach at times. esp. if you generate javadocs.
I tend to do this when I've already tried several other approaches which didn't work, in an effort to stop myself coming back in a few months, and trying to refactor back to those approaches again.
I put this as an asnwer because I think it's important. Basically I agree with Stuart:
A coding standard at a previous job said to only ever comment code if it is dealing with something exceptionally computationally difficult such as a complex compression or encryption algorithm. That company had the best damn code I have EVER seen and I have worked in a wide range of different types of company.
As much as I agree with everyone who is arguing for self documenting code, there is one place I feel they overlooked where comments are important: Interfaces, and to a lesser extent, abstract methods. The contract on an Interface method is usually (and probably should be) more complicated than can be described in the method name. The only way you can document that is with a comment.
If code falls into standard patterns with sensible method names etc then it's often not necessary to comment. However, often I'll write a quite short routine that does something nifty, usually involving a good dollop of maths and a bit of swishy logic. I'll usually comment every line of something like this, as if I don't I'll come back to it later and look at it and think 'Why is it doing that?' whereas if I've commented it it's so so much easier. It saves me a minute or more of re-thinking the problem every time I look at it.
So to answer the question, No, not necessarily.
Dittos to Guillaume. I strongly disagree with those who say that if you just write your code clearly enough, comments are unnecessary.
Yes, it is surely true that some types of comments are unnecessary if you just use meaningful variable and function names. Like, why write:
x17=bbq*rg; // Multiply price by tax rate to get tax amount
Better to write:
taxAmount=price*taxRate;
And I just find it mind-bogglingly annoying when someone writes a comment that just restates what any competent programmer should be able to instantly tell by reading the code. I don't know how many times I've seen useless comments like:
x=x+1; // Add one to x
Like, wow, thanks for telling me what "+" does. Without the comment I might have had to Google it.
But there are at least three types of comments that I find very useful:
Comments that explain the overall purpose or approach behind some large block of code. Something that gives me a clue why you're doing what you're doing. Like, "// When the delivery location is changed, recalculate the sales tax using the rates applicable to the new location". Sure, I could hack through the code and ultimately figuring out what you're doing, but if you tell me up front, I don't have to figure it out. This isn't supposed to be a game where we see how long it takes me to solve the problem. Just tell me the answer.
Comments that exlain how a function should be called or how to use an API in general. Like, "To retrieve the pricing data, first call contactPricingServer to make the connection, then call getPrice for each item you need a price for, and finally call closeConnection to release the resources." Maybe if I studied all the available functions long enough I could figure that out, but again, this isn't supposed to be a quiz. Just tell me the answer.
Probably the most valuable of all: Comments that explain a piece of code that might not be obvious. Like, "We must look up by sale date and customer number rather than by sale id because the sale id's on the recon server aren't updated until month end." I've had plenty of times that I've looked at some code that looked strange and wondered if it was a bug or if there is some good reason why the programmer did it that way. The problem is not that I cannot figure out what the program is doing. The problem is why it is doing X rather than doing Y, when Y seems like what it ought to do.
Update
I have to laugh. Right after making the above post, including the remark about useless comments that just restate what the programmer can instantly see from looking at the code, I went back to doing some real work, and the next piece of code I looked at said:
// Set the sales status to "S" and the delivery date to today.
setSalesStatus("S");
setDeliveryDate(today);
Like, wow, thanks for that helpful comment. That cleared up all my confusion immediately. Because, you see, the REAL question that I'm trying to figure out is, Why are we setting a delivery date here? This doesn't appear to be an appropriate place to be doing that. But instead of a comment that explains the purpose of the code, all I have is a restatement of the obvious. Arggh.
Personally, I always comment my code. No long lines, but short bits like
// parser starts here
...
// mail!
etc.
However, if you have a 5 line script, commenting might be a bit overkill... especially if your filename says it.
If it's for you, and you alone, you totally should do it if you feel like it. If you're working on something with a team, I'd do it always. Just to notify the others what they're looking at.
Comments and bad design are two different things. Comments help to understand certain things in a bigger context. Comments (and documentation in general) help new folks in a project to do some reading on their own.
Actually we have some unwritten rule that we must comment our stuff. If I wonder whether or not somebody can understand this at first glance it's already a sign of a required comment.
However, it's not a sign of bad design.

Converting a VB.NET Project to a C# Project [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm looking for a tool (paid or OSS) to convert a mid-sized VB.NET project to a C# project. I've searched StackOverflow and have found a few questions/answers, but most suggest .NET Reflector or online copy/paste single file tools. Reflector doesn't seem to fit the bill as it will convert an assembly, but we're looking for a whole-sale project converter which will maintain the project including file names, comments, etc.
We're fully willing to manually address items that cannot be automatically converted, but would like to start off with a fairly comprehensive converted project.
One recommendation we found is Elegance Technologies' CSharpener for VB.NET - http://www.elegancetech.com/csvb/csvb.aspx. Based on their site, it hasn't been revved since pre-VS 2008.
Recommendations will be appreciated.
SharpDevelop is an open source IDE and it allows you to covert between VB and C#.
Do be aware that there are some things which can be done nicely in VB.net that cannot be done nicely, if at all in C# (and vice versa). Two of note:
In vb.net, declaration-initializations (e.g. "Dim Foo As Bar = Whatever") in a derived class occur after the base constructor has run, and can make reference to the object being constructed. In C#, such declaration-initializations occur before the base constructor is run, and cannot reference the object under construction. One could probably move all such initialization to the constructor, but if there are multiple constructors that may require the creation of redundant code.
In vb.net, a Catch statement may include a condition (e.g. Catch Ex As FancyException When Ex.SomeProperty = 9). In C#, the only way to a achieve a somewhat similar result is to catch an exception and then decide if it meets the necessary criteria, rethrowing if not; this will yield different semantics in a number of ways. Among other things, at the time the When clause is evaluated, Finally statements which will be tripped by the exception will not yet have run, so allowing the state of the system to be captured. Further, if break-on-unhandled-exception is set, and no "When" condition is satisfied, the debugger will break at the location where the original exception occurred. If the exception had been caught and rethrown, the debugger would break at the re-throw.
I would think an IL-to-C# translator might do an okay job of moving initializations to an object's constructors, though that lead to some annoying repetition. I don't think there's any way for C# code to match the semantics of VB.net's exception handling, though.
Two words: A programmer.
If you want it to be the most bug free and just work hire a programmer.
A quick google turns up http://www.freelancer.com where you can hire a one time programmer.
If you're not satisfied with SharpDevelop, TangibleSolutions will provide support with their converters to ensure your happiness.
SharpDevelop is quite good, but at my company we've found VBConversions to provide a much more complete conversion. It's a commerical app though, but for the time saved over SharpDevelop it was a no-brainer for us.
As a specific example, one thing we found that SharpDevelop didn't convert correctly was VB indexes, which use curvy brackets. It seemed unable to distinguish between indexes and method calls so didn't convert the indexes to square brackets. VBConversions converted them fine. This one thing made it worth its purchase for us.

Looking for a few good C# interview problems [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I do not want to ask candidates questions, but rather give them several problems to resolve. The reason for this is that I've seen people be excellent with theory, but when confronted by a real world c# issue, just couldn't hack it.
These c# problems should be simple enough that it won't take more than 1-20 minutes to resolve, yet complicated enough that I'd be able to weed out candidates that can't code.
Right now, I typically ask the applicants to reverse a string and remove duplicates from a List. This alone weeds out a large number of people.
Any other examples I could use?
Edit: I should have mentioned that this is for a standard c# gig, where they'll be writing business code rather than finding the most optimal way to implement a linked list.
I like picking simple problems that I actually had to solve at some point; it doesn't get more relevant to the job than that.
When I worked on VBScript I'd ask college candidates how to write a simplified version of DateDiff, since doing so was what I did my first real day of work at Microsoft. More advanced candidates I would ask how to build a device which tracks the relationship between 32 bit handles and an associated 64 bit pointer, which again I actually had to do when working on VBScript.
More recently I tend to ask questions about tree manipulation algorithms, since the compiler is all about tree manipulation. Or about how to codegen new operators using monads, since that's how LINQ works.
My point is not that you should use questions in these areas, my point is that surely you must have had problems that you had to solve in your day-to-day work. Ask the candidates about those problems -- then you'll learn how they solve a realistic problem, and they'll learn what sorts of problems they'd be solving if they came to work with you.
dont ask for knowledge of class libraries or obscure corners of the language (unsafe, dynamic, ..); smart people can pick these up or look them up.
I would ask to design a class hierarchy to represent something real world (vehicles, animals, ...). This usually flushes out the people who dont get objects. Make them do it with interfaces too. Also make them reverse a string - no harm in oldies but goldies
I agree with you, it is surprising how many people claim to be experienced and you find out that all that they did was read the box…
I don’t know if testing for C# is as valuable as it first seems… sure you could ask them to describe an example of when they needed to use inheritance, or why casting might have a performance problem, etc. But these are easy to study for. You would be surprised at how many interviewees give the example using “car” or “color” when giving their real world example of inheritance…. Guess they are in a book somewhere.
When looking at this problem it helps me when I compare experience in development to learning Spanish. A short time into the class everyone is conjugating verbs and can pass a test on this… but nobody speaks Spanish yet. You want the guy that claims to speak Spanish and can actually do it.
So I like to be more specific with the other technologies that will tell me if they have traveled the well-worn path of development. If they say they are an ASP.Net developer I ask them simple questions, but ones that are on the path
EXAMPLES: Give me an example of where the connection string could live? If you need to pass an ID from one page to another, what are your options? If a page takes 5 minutes to load, tell me how you would go about troubleshooting it. If I had a web page that had a single button on it, how would I center that button? Tell me the difference between storing variables in the viewstate verses session state?
You don’t have to know everything, but eighty percent of the people interviewing for a senior level position will get 10% of these types of questions right. (And on 70% of the phone interviews you will hear them Googling for the answers – good thing these aren’t the types of questions you can easily Google for.)
SQL Server is about the same. They say they would rate themselves an 8 or 9 in SQL Sever development, but then get 10% of questions. The questions again are to see if you have been on the well-worn path.
EXAMPLES: If you had a table of customers and a table of orders, how would you find the customers that had no orders? What is a clustered index? If I had a table of developers and a table of projects, how would I set it up so that projects could have multiple developers on it and developers could be on multiple projects?
How could you develop in SQL Server for “years” and not have hit these concepts? A high percentage of candidates get almost none of these answers right!! (I guess the SQL Server box isn’t as informative.)
So if you say you are a senior level guy and you can say “Soy un revelador de software” (I am a software developer), but can’t say “He hecho eso antes” (I have done that before), I don’t think you are the senior level person you are claiming to be.
Now this tells you if they have been on the well-worn path, but not if they are smart and have good problem solving skills. Having gone thru a ton of these types of interviews I can tell you that by the time the process is done you will be satisfied with having enough information to have a strong opinion on both of these issues. You might also see that by then giving them a problem set to solve is unnecessary.
Show them a small section of code or architecture diagram from one of your own projects and ask them to suggest how they would refactor it. Even if you don't wind up hiring them, you might get some interesting suggestions on ways to improve your code.
Building Eric's and other answers here, but answering as an only-ever-so-far-interviewee, what I would like in an interview is a kind of pair-programming 'test', where you sit down together facing the screen, and talk through a real-world problem.
I think there would be many advantages:
For the interviewee, being in front of a screen instead of facing the interviewer makes it easier to think about the problem rather than the interview.
For the interviewer, being with the interviewee while they look through the code and ask questions about the problem space would give a much greater insight into how the interviewee thinks, how they approach problems, and how they communicate and interact with others.
I would expect that it's more important and interesting to see a candidate thinking round the edges of your real-world problem, even if they don't completely solve it, than to have them get 10 out of 10 on come algorithmic test.
Something mildly algorithmic.
Write a method that returns true if a string is a palindrome, and false otherwise.
Re-implement the String.Substring(int, int) method.
Something about object-oriented design too.
Design a checkers game (ie, define the classes and some of the methods).
One question I was asked, and subsequently ask interviewees, is"Describe how you would make this phone into an application". Have them describe the classes, their properties, methods, interfaces, etc. Then question them on why they chose to implement them in that specific way. It gives you a good idea if they understand how to code, and gives you some insight into how they approach and solve problems.
Also, if you offer a suggestion of how they could have implemented it a different way, it may show you whether they are open to new ideas, criticism, or if they are a team player or not.
Fizz Buzz

XML Commenting tips for C# programming [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Good morning, afternoon, evening or night (depending on your timezone).
This is just a general question about XML commenting within C#. I have never been very big into commenting my programs, I've always been more of a verbose variable/property/method namer and letting the code speak for itself. I do write comments if I'm coding something that is fairly confusing, but for the most part I don't write alot of comments.
I was doing some reading about XML comments in .NET, Sandcastle, and the help file builder on codeplex and it has taken me down the path of wanting to document my code and generate some nice, helpful documentation for those who have to dig into my code when I'm no longer here.
My question is about standards and conventions. Is there a guide to "good" XML commenting? Should you comment EVERY variable and property? EVERY method? I'm just basically looking for tips on how to write good comments that will be compiled by sandcastle into good documentation so other programmers don't curse my name when they end up having to work on my code.
Thank you in advance for your advice and suggestions,
Scott Vercuski
Personally, we make sure that every public and protected method has XML comments. It also will provide you with Intellisense, and not just end-user help documentation. In the past, we also have included it on privately scoped declarations, but do not feel it is 100% required, as long as the methods are short and on-point.
Don't forget that there are tools to make you XML commenting tasks easier:
GhostDoc - Comment inheritance and templating add-in.
Sandcastle Help File Builder - Edits the Sandcastle projects via a GUI, can be run from a command line (for build automation), and can edit MAML for help topics not derived from code. (The 1.8.0.0 alpha version is very stable and very improved. Have been using it for about a month now, over 1.7.0.0)
Comments are very often outdated. This always has been a problem. My rule of thumb : the more you need to work to update a comment, the faster that comment will be obsolete.
XML Comments are great for API development. They works pretty well with Intellisens and they can have you generate an HTML help document in no time.
But this is not free: maintaining them will be hard (look at any non-trivial example, you will understand what I mean), so they will tend to be outdated very fast. As a result, reviewing XML Comments should be added to your code review as a mandatory check and this check should be performed every time a file is updated.
Well, since it is expensive to maintain, since a lot of non private symbols (in non-API development) are used only by 1 or 2 classes, and since these symboles are often self-explanatory, I would never enforce a rule saying that every non-private symbol should be XML commented. This would be overkill and conterproductive. What you will get is what I saw at a lot of places : nearly empty XML Comments adding nothing to the symbole name. And code that is just a little less readable...
I think that the very, very important guide line about comments in normal (non-API) code should not be about HOW they should be written but about WHAT they should contain. A lot of developers still don't know what to write. A description of what should be commented, with examples, would do better for your code than just a plain : "Do use XML Comments on every non-private symbole.".
I document public classes and the Public/Protected Members of those classes.
I don't document private or internal members or internal classes. Hence variables (I think you mean fields) because they are private.
The objective is to create some documentation for a developer who does not have ready access to the source code.
Endeavour to place some examples where usage is not obvious.
I very rarely comment on method variables, and equally rarely fields (since they are usually covered by a property, or simply don't exist if using auto-implemented properties).
Generally I try hard to add meaningful comments to all public/protected members, which is handy, since if you turn on the xml comments during build, you get automatic warnings for missing comments. Depending on the complexity, I might not fill out every detail - i.e. if it is 100% obvious what every parameter has to do (i.e. there is no special logic, and there is only 1 logical way of interpreting the variables), then I might get lazy and not add comments about the parameters.
But I certainly try to describe what methods, types, properties, etc represent/do.
We document the public methods/properties/etc on our libraries. As part of the build process we use NDoc to create an MSDN-like web reference. It's been very helpful for quick reference and lookup.
It's also great for Intellisense, especially with new team members or, like you said, when the original author is gone.
I agree that code, in general, should be self-explanatory. The XML documention, to me, is more about reference and lookup when you don't have the source open.
Personally my opinion is to avoid commenting. Commenting is dangerous. Because in industry we always update code(because business & requirements are always changing), but vary rarely we update our comments. This may misguide the programmers.

What's the best way of parsing strings? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
We've got a scenario that requires us to parse lots of e-mail (plain text), each e-mail 'type' is the result of a script being run against various platforms. Some are tab delimited, some are space delimited, some we simply don't know yet.
We'll need to support more 'formats' in the future too.
Do we go for a solution using:
Regex
Simply string searching (using string.IndexOf etc)
Lex/ Yacc
Other
The overall solution will be developed in C# 2.0 (hopefully 3.5)
Regex.
Regex can solve almost everything except for world peace. Well maybe world peace too.
The three solutions you stated each cover very different needs.
Manual parsing (simple text search) is the most flexible and the most adaptable, however, it very quickly becomes a real pain in the ass as the parsing required is more complicated.
Regex are a middle ground, and probably your best bet here. They are powerful, yet flexible as you can yourself add more logic from the code that call the different regex. The main drawback would be speed here.
Lex/Yacc is really only adapted to very complicated, predictable syntaxes and lacks a lot of post compile flexibility. You can't easily change parser in mid parsing, well actually you can but it's just too heavy and you'd be better using regex instead.
I know this is a cliché answer, it all really comes down to what your exact needs are, but from what you said, I would personally probably go with a bag of regex.
As an alternative, as Vaibhav poionted out, if you have several different situations that can arise and that you cna easily detect which one is coming, you could make a plugin system that chooses the right algorithm, and those algorithms could all be very different, one using Lex/Yacc in pointy cases and the other using IndexOf and regex for simpler cases.
You probably should have a pluggable system regardless of which type of string parsing you use. So, this system calls upon the right 'plugin' depending on the type of email to parse it.
You must architect your solution to be updatable, so that you can handle unknown situations when they crop up. Create an interface for parsers that contains not only methods for parsing the emails and returning results in a standard format, but also for examining the email to determine if the parser will execute.
Within your configuration, identify the type of parser you wish to use, set its configuration options, and the configuration for the identifiers which determine if a parser will act or not. Name the parsers by assembly qualified name so that the types can be instantiated at runtime even if there aren't static links to their assemblies.
Identifiers can implement an interface as well, so you can create different types that check for different things. For instance, you might create a regex identifier, which parses the email for a specific pattern. Make sure to make as much information available to the identifier, so that it can make decisions on things like from addresses as well as the content of the email.
When your known parsers can't handle a job, create a new DLL with types that implement the parser and identifier interfaces that can handle the job and drop them in your bin directory.
It depends on what you're parsing. For anything beyond what Regex can handle, I've been using ANTLR. Before you jump into recursive descent parsing for the first time, I would research how they work, before attempting to use a framework like this one. If you subscribe to MSDN Magazine, check the Feb 2008 issue where they have an article on writing one from scratch.
Once you get the understanding, learning ANTLR will be a ton easier. There are other frameworks out there, but ANTLR seems to have the most community support and public documentation. The author has also published The Definitive ANTLR Reference: Building Domain-Specific Languages.
Regex would probably be you bes bet, tried and proven. Plus a regular expression can be compiled.
Your best bet is RegEx because it provides a much greater degree of flexibility than any of the other options.
While you could use IndexOf to handle somethings, you may quickly find yourself writing code that looks like:
if(s.IndexOf("search1")>-1 || s.IndexOf("search2")>-1 ||...
That can be handled in one RegEx statement. Plus, there are a lot of place like RegExLib.com where you can find folks who have shared regular expressions to solve problems.
#Coincoin has covered the bases; I just want to add that with regex it's particularly easy to end up with hard-to-read, hard-to-maintain code. Regex is a powerful and very compact language, so that's how it often goes.
Using whitespace and comments within the regex can go a long way to make it easier to maintain regexes. Eric Gunnerson turned me on to this idea. Here's an example.
Use PCRE. All other answers are just 2nd Best.
With as little information you provided, i would choose Regex.
But what kind of information you want to parse and what you would want to do will change the decision to Lex/Yacc maybe..
But it looks like you've already made your mind up with String search :)

Categories