Why isn't string concatenation automatically converted to StringBuilder in C#? [duplicate] - c#

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why is String.Concat not optimized to StringBuilder.Append?
One day I was ranting about a particular Telerik control to a friend of mine. I told him that it took several seconds to generate a controls tree, and after profiling I found out that it is using a string concatenation in a loop instead of a StringBuilder. After rewriting it worked almost instantaneously.
So my friend heard that and seemed to be surprised that the C# compiler didn't do that conversion automatically like the Java compiler does. Reading many of Eric Lippert's answers I realize that this feature didn't make it because it wasn't deemed worthy enough. But if, hypothetically, costs were small to implement it, what rationale would stop one from doing it?

But if, hypothetically, costs were small to implement it, what rationale would stop one from doing it?
It sounds like you're proposing a bit of a tautology: if there is no reason to not do X, then is there a reason to not do X? No.
I see little value in knowing the answers to hypothetical, counterfactual questions. Perhaps a better question to ask would be a question about the real world:
Are there programming languages that use this optimization?
Yes. In JScript.NET, we detect string concatenations in loops and the compiler turns them into calls to a string builder.
That might then be followed up with:
What are some of the differences between JScript .NET and C# that justify the optimization in the one language but not in the other?
A core assumption of JScript.NET is that its programmers are mostly going to be JavaScript programmers, and many of them will have already built libraries that must run in any implementation of ECMAScript. Those programmers might not know the .NET framework well, and even if they do, they might not be able to use StringBuilder without making their library code non-portable. It is also reasonable to assume that JavaScript programmers may be either novice programmers, or programmers who came to programming via their line of business rather than a course of study in computer science.
C# programmers are far more likely to know the .NET framework well, to write libraries that work with the framework, and to be experienced programmers who understand why looped string concatenation is O(n2) in the naive implementation. They need this optimization generated by the compiler less because they can just do it themselves if they deem it necessary.
In short: compiler features are about spending our budget to add value for the customer; you get more "bang for buck" adding the feature to JScript.NET than you do adding it to C#.

The C# compiler does better than that.
a + b + c is compiled to String.Concat(a, b, c), which is faster than StringBuilder.
"a" + "b" is compiled directly to "ab" (useful for multi-line literals).
The only place to use StringBuilder is when concatenating repetitively inside a loop; the compiler cannot easily optimize that.

Related

Why does C# also not allow empty conditions in while loops? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Edit: I changed most of my question, because it was too long and altough my question is a request of facts, it was considered opinion based. Having said that, please read the comments where I try to explain why closing this question was wrong IMHO.
Also: I'd like to appologize for my initial question, I am not a native English speaker and I didn't know the word [blindly] had such a negative tone. I actually used the word in other questions.
Background:
Consider the following piece of C# code:
for(; /*empty condition*/ ;)
{
//Infinite loop
}
This amongst other methods is considered good practice to make an infinite loop. But when we would try a similar approach with a while loop it would never compile:
while(/*empty condition*/)
{
//Compiler error
}
At first I thought that this was some sort of bug in my compiler, but then I read the C# language specification. I found this was be design. Why? Because C# is based on C, and C behaves this way.
So now the question is, why does C behave like this? Somebody else asked this question on StackOverflow already. The answer was pretty unsatisfying and came down to this:
It behaves like this because it is described like this in the C language specification.
This answer reminded me of many discussions I had with my parents when I was a kid: "Why do I have to clean my room?" - "Because we say so.". Further answers speculate (i.e. no sources or arguments were added) that while() is "hacky" and that "using for(;;) made more sense".
My research
Edit: deleted because it was considered to long. It basically was an effort to figure out why C had this construction.
My question:
After all my research I concluded that the while loop's inability to accept empty expressions is illogical if the for loop can.
But if that is true, then why did the C# language design team copy this bevahiour?
You: "C# is based on C and why would you reinvent the wheel?"
True, but why make the same illogical decissions? If your grandfather would jump of a bridge, then would you do it too, just because you are based on him? And isn't the creation of a new language - based on an old one - the ideal situation to avoid/fix the illogical pitfalls of the old language?
So to repeat my question:
Why did the C# design team copy this behaviour?
After all my research I can only conclude that the while loop's inability to accept empty expressions is illogical.
A very far fetched conclusion. IMHO it is the for(;;) loop that is illogical (and not only in this respect).
It is clear that while() { ... } would have been possible but what exactly is the merit?
As a matter of style I would prefer for(;true;) over for(;;), it has less chance of being misread.
Being able to write a 'for-ever' loop is a minor issue, avoiding typos is much more important.
Readability is the only thing that counts, you're not making much of a case for while().
And what should happen in this statement?
if() Foo();
C11 specification states:
The statement
for ( clause-1 ; expression-2 ; expression-3 ) statement
behaves as follows: The expression expression-2 is the
controlling expression that is evaluated before each execution of the
loop body. The expression expression-3 is evaluated as a void
expression after each execution of the loop body. If clause-1 is a
declaration, the scope of any identifiers it declares is the remainder
of the declaration and the entire loop, including the other two
expressions; it is reached in the order of execution before the first
evaluation of the controlling expression. If clause-1 is an
expression, it is evaluated as a void expression before the first
evaluation of the controlling expression. 158)
Both clause-1 and expression-3 can be omitted. An omitted expression-2 is replaced by a
nonzero constant.
which is not the case with the while-loop. However this is C11. ANSI C doesn't make this clear really. Though, I assume C# is based on how C most commonly work(s|ed), not how it's specified to work.
To speculate, I could think that the for-loop in early C wasn't well defined, so programmers found out that you can write an infinite loop like for (;;). To be compatible with old programs, the standards never forbid this. So there is really no reason to write like this. It's just history I guess.

Developing Abstract Syntax Tree

I've scoured the internet looking for some newbie information on developing a C# Abstract Syntax Trees but I can only find information for people already 'in-the-know'. I am a line-of-business application developer so topics like these are a bit over my head, but this is for my own education so I'm willing to spend the time and learn whatever concepts are necessary.
Generally, I'd like to learn about the techniques behind developing an abstract representation of code from a code string. More specifically, I'd like to be able to use this AST to do C# syntax highlighting. (I realize that syntax highlighting doesn't necessary need an AST, but this seems like a good opportunity to learn some "compiler"-level techniques.)
I apologize if this question is a bit broad, but I'm not sure how else to ask.
Thanks!
First you need to understand what parsing is, and what abstract syntax trees are. For this, you can consult Wikipedia on abstract syntax trees for a first look.
You really need to spend some time with a compiler text book to understand how abstract syntax trees are related to parsing, and can be constructed while parsing; the classic reference is Aho/Ullman/Sethi's "Compilers" book (easily found on the web). You may find the SO answer to Are there any "fun" ways to learn about Languages, Grammars, Parsing and Compilers? instructive.
Once you understand how to build an AST for a simple grammar, you can then turn your attention to something like C#. The issue here is sheer scale; it is one thing to play with a toy language with 20 grammar rules. It is another to work with grammar of several hundred or a thousand rules. Experience will small ones will make it a lot easier to understand how the big ones are put together, and how to live with them.
You probably don't want to build your own C# grammar (or implement the one from the C# standard); its quite a lot of work. You can get available tools that will hand you C# ASTs (Roslyn has already been mentioned; ANTLR has a C# parser, there are many more).
It is true that you might use an AST for syntax highlighting (although that is probably killing a gnat with a sledgehammer). What most people don't think much about (but the compiler books emphasize), is what happens after you have an AST; mostly they aren't useful by themselves. You actually need a lot more machinery to do anything interesting.
Rather than repeat this over and over (I keep seeing the same kind of questions), you can see my discussion on Life After Parsing for more details.
You should probably take a look at this talk by Phil Trelford:
Write your own compiler in 24 hours
This man is a genius, and will leave you fired up to learn about compilers. He explains it literally easily enough for a five year old to understand. The five year old in question is his son, so probably has an unfair advantage, but five is five.
Take a look at Roslyn. I think it could be what you're looking for. It gives you access to the compilers AST, among lots of other amazing things!
http://blogs.msdn.com/b/visualstudio/archive/2011/10/19/introducing-the-microsoft-roslyn-ctp.aspx
Beyond that, I suggest a textbook on compilers.

Converting a VB.NET Project to a C# Project [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm looking for a tool (paid or OSS) to convert a mid-sized VB.NET project to a C# project. I've searched StackOverflow and have found a few questions/answers, but most suggest .NET Reflector or online copy/paste single file tools. Reflector doesn't seem to fit the bill as it will convert an assembly, but we're looking for a whole-sale project converter which will maintain the project including file names, comments, etc.
We're fully willing to manually address items that cannot be automatically converted, but would like to start off with a fairly comprehensive converted project.
One recommendation we found is Elegance Technologies' CSharpener for VB.NET - http://www.elegancetech.com/csvb/csvb.aspx. Based on their site, it hasn't been revved since pre-VS 2008.
Recommendations will be appreciated.
SharpDevelop is an open source IDE and it allows you to covert between VB and C#.
Do be aware that there are some things which can be done nicely in VB.net that cannot be done nicely, if at all in C# (and vice versa). Two of note:
In vb.net, declaration-initializations (e.g. "Dim Foo As Bar = Whatever") in a derived class occur after the base constructor has run, and can make reference to the object being constructed. In C#, such declaration-initializations occur before the base constructor is run, and cannot reference the object under construction. One could probably move all such initialization to the constructor, but if there are multiple constructors that may require the creation of redundant code.
In vb.net, a Catch statement may include a condition (e.g. Catch Ex As FancyException When Ex.SomeProperty = 9). In C#, the only way to a achieve a somewhat similar result is to catch an exception and then decide if it meets the necessary criteria, rethrowing if not; this will yield different semantics in a number of ways. Among other things, at the time the When clause is evaluated, Finally statements which will be tripped by the exception will not yet have run, so allowing the state of the system to be captured. Further, if break-on-unhandled-exception is set, and no "When" condition is satisfied, the debugger will break at the location where the original exception occurred. If the exception had been caught and rethrown, the debugger would break at the re-throw.
I would think an IL-to-C# translator might do an okay job of moving initializations to an object's constructors, though that lead to some annoying repetition. I don't think there's any way for C# code to match the semantics of VB.net's exception handling, though.
Two words: A programmer.
If you want it to be the most bug free and just work hire a programmer.
A quick google turns up http://www.freelancer.com where you can hire a one time programmer.
If you're not satisfied with SharpDevelop, TangibleSolutions will provide support with their converters to ensure your happiness.
SharpDevelop is quite good, but at my company we've found VBConversions to provide a much more complete conversion. It's a commerical app though, but for the time saved over SharpDevelop it was a no-brainer for us.
As a specific example, one thing we found that SharpDevelop didn't convert correctly was VB indexes, which use curvy brackets. It seemed unable to distinguish between indexes and method calls so didn't convert the indexes to square brackets. VBConversions converted them fine. This one thing made it worth its purchase for us.

Coding guidelines + Best Practices? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I couldn't find any question that directly applies to my query so I am posting this as a new question. If there is any existing discussion that may help me, please point it out and close the question.
Question:
I am going to do a presentation on C# coding guidelines but it is not supposed to limit to coding standards.
So I have a rough idea but I think I need to address good programing practices. So the contents will be something like this.
Basic coding standards - Casing, Formatting etc.
Good practices - Usage of Hashset over other data structures, String vs String Builder, String's immutability and using them effectively etc
Really I would like to add more good practices (Especially to improve the performance.) So like to hear some more good practices to be used with C#. Any suggestions??? (No need of large descriptions :) Just the idea is sufficient.)
Coding Guidelines for CSharp 3.0 and 4.0
IDesign Coding Standards
Lance Hunt's C# Coding Standards
Brad Abrams' Internal Coding Guidelines
Unsurprisingly, I just found a SO question: C# Coding standard / Best practices
Here are a few tips:
Use FxCop for static analysis.
Use StyleCop for coding style validation.
Because of the different semantics of value types, supply them with an alternative color in the IDE (go to Tools / Options / Environment / Fonts and Colors / Display Items and supply User Types (Enums) and User Types (Value types) with a value like #DF7120 [223, 113, 32]).
Because exceptions tend to show bugs in your code, let the IDE break on all exceptions. (go to Debug / Exceptions... / Common Language Runtime Exceptions and check Throw).
Project settings: Disallow unsafe code.
Project settings: Threat warnings as errors.
Project settings: Check for arithmetic overflow/underflow.
Use variables for a single, well defined goal.
Don't use magic numbers.
Write short methods. A method should only contain one level of abstraction.
A method can never be too small (a method of 20 lines is considered pretty big).
A method should protect itself against bad input.
Consider making a type immutable.
Don't suppress warnings in your code with pragma warning disable.
Don't comment bad code: rewrite it.
Document explicitly in code why you are swallowing an exception.
Note the performance implications of concatenating strings.
Never use goto statements.
Fail early, fail fast.
I'm using Microsoft's Design Guidelines for Developing Class Libraries.
And I think it is quite good to start with.
Basic Coding Standards - Make sure it's consistent. Even if they don't follow the conventions set out in this document on msdn. I think consistency is really key here.
Unit Tests - You cannot go wrong here.
Security - Talk about ensuring that if you are passing sensitive data around that it's secure.
Performance - You know, I personally feel that getting the application right and then looking at performance is what I do. I do have it in the back of my mind when writing code, so it's little fine tunings that come in at the end.

What's the best way of parsing strings? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
We've got a scenario that requires us to parse lots of e-mail (plain text), each e-mail 'type' is the result of a script being run against various platforms. Some are tab delimited, some are space delimited, some we simply don't know yet.
We'll need to support more 'formats' in the future too.
Do we go for a solution using:
Regex
Simply string searching (using string.IndexOf etc)
Lex/ Yacc
Other
The overall solution will be developed in C# 2.0 (hopefully 3.5)
Regex.
Regex can solve almost everything except for world peace. Well maybe world peace too.
The three solutions you stated each cover very different needs.
Manual parsing (simple text search) is the most flexible and the most adaptable, however, it very quickly becomes a real pain in the ass as the parsing required is more complicated.
Regex are a middle ground, and probably your best bet here. They are powerful, yet flexible as you can yourself add more logic from the code that call the different regex. The main drawback would be speed here.
Lex/Yacc is really only adapted to very complicated, predictable syntaxes and lacks a lot of post compile flexibility. You can't easily change parser in mid parsing, well actually you can but it's just too heavy and you'd be better using regex instead.
I know this is a cliché answer, it all really comes down to what your exact needs are, but from what you said, I would personally probably go with a bag of regex.
As an alternative, as Vaibhav poionted out, if you have several different situations that can arise and that you cna easily detect which one is coming, you could make a plugin system that chooses the right algorithm, and those algorithms could all be very different, one using Lex/Yacc in pointy cases and the other using IndexOf and regex for simpler cases.
You probably should have a pluggable system regardless of which type of string parsing you use. So, this system calls upon the right 'plugin' depending on the type of email to parse it.
You must architect your solution to be updatable, so that you can handle unknown situations when they crop up. Create an interface for parsers that contains not only methods for parsing the emails and returning results in a standard format, but also for examining the email to determine if the parser will execute.
Within your configuration, identify the type of parser you wish to use, set its configuration options, and the configuration for the identifiers which determine if a parser will act or not. Name the parsers by assembly qualified name so that the types can be instantiated at runtime even if there aren't static links to their assemblies.
Identifiers can implement an interface as well, so you can create different types that check for different things. For instance, you might create a regex identifier, which parses the email for a specific pattern. Make sure to make as much information available to the identifier, so that it can make decisions on things like from addresses as well as the content of the email.
When your known parsers can't handle a job, create a new DLL with types that implement the parser and identifier interfaces that can handle the job and drop them in your bin directory.
It depends on what you're parsing. For anything beyond what Regex can handle, I've been using ANTLR. Before you jump into recursive descent parsing for the first time, I would research how they work, before attempting to use a framework like this one. If you subscribe to MSDN Magazine, check the Feb 2008 issue where they have an article on writing one from scratch.
Once you get the understanding, learning ANTLR will be a ton easier. There are other frameworks out there, but ANTLR seems to have the most community support and public documentation. The author has also published The Definitive ANTLR Reference: Building Domain-Specific Languages.
Regex would probably be you bes bet, tried and proven. Plus a regular expression can be compiled.
Your best bet is RegEx because it provides a much greater degree of flexibility than any of the other options.
While you could use IndexOf to handle somethings, you may quickly find yourself writing code that looks like:
if(s.IndexOf("search1")>-1 || s.IndexOf("search2")>-1 ||...
That can be handled in one RegEx statement. Plus, there are a lot of place like RegExLib.com where you can find folks who have shared regular expressions to solve problems.
#Coincoin has covered the bases; I just want to add that with regex it's particularly easy to end up with hard-to-read, hard-to-maintain code. Regex is a powerful and very compact language, so that's how it often goes.
Using whitespace and comments within the regex can go a long way to make it easier to maintain regexes. Eric Gunnerson turned me on to this idea. Here's an example.
Use PCRE. All other answers are just 2nd Best.
With as little information you provided, i would choose Regex.
But what kind of information you want to parse and what you would want to do will change the decision to Lex/Yacc maybe..
But it looks like you've already made your mind up with String search :)

Categories