Code Generators or T4 Templates, are they really evil?

Code Generators or T4 Templates, are they really evil? - c#

I have heard people state that Code Generators and T4 templates should not be used. The logic behind that is that if you are generating code with a generator then there is a better more efficient way to build the code through generics and templating.
While I slightly agree with this statement above, I have not really found effective ways to build templates that can say for instance instantiate themselves. In otherwords I can never do :
return new T();
Additionally, if I want to generate code based on database values I have found that using Microsoft.SqlServer.Management.SMO in conjunction with T4 templates have been wonderful at generating mass amounts of code without having to copy / paste or use resharper.
Many of the problems I have found with Generics too is that to my shock there are a lot of developers who do not understand them. When I do examine generics for a solution, there are times where it gets complicated because C# states that you cannot do something that may seem logical in my mind.
What are your thoughts? Do you prefer to build a generator, or do you prefer to use generics? Also, how far can generics go? I know a decent amount about generics, but there are traps and pitfalls that I always run into that cause me to resort to a T4 template.
What is the more proper way to handle scenarios where you need a large amount of flexibility? Oh and as a bonus to this question, what are good resources on C# and Generics?

You can do new T(); if you do this
public class Meh<T>
where T : new()
{
public static T CreateOne()
{
return new T();
}
}
As for code-generators. I use one every day without any problems. I'm using one right now in fact :-)
Generics solve one problem, code-generators solve another. For example, creating a business model using a UML editor and then generating your classes with persistence code as I do all of the time using this tool couldn't be achieved with generics, because each persistent class is completely different.
As for a good source on generics. The best has got to be Jon Skeet's book of course! :-)

As the originator of T4, I've had to defend this question quite a few times as you can imagine :-)
My belief is that at its best code generation is a step on the way to producing equivalent value using reusable libraries.
As many others have said, the key concept to maintain DRY is never, ever changing generated code manually, but rather preserving your ability to regenerate when the source metadata changes or you find a bug in the code generator. At that point the generated code has many of the characteristics of object code and you don't run into copy/paste type problems.
In general, it's much less effort to produce a parameterized code generator (especially with template-based systems) than it is to correctly engineer a high quality base library that gets the usage cost down to the same level, so it's a quick way to get value from consistency and remove repetition errors.
However, I still believe that the finished system would most often be improved by having less total code. If nothing else, its memory footprint would almost always be significantly smaller (although folks tend to think of generics as cost free in this regard, which they most certainly are not).
If you've realised some value using a code generator, then this often buys you some time or money or goodwill to invest in harvesting a library from the generated codebase. You can then incrementally reengineer the code generator to target the new library and hopefully generate much less code. Rinse and repeat.
One interesting counterpoint that has been made to me and that comes up in this thread is that rich, complex, parametric libraries are not the easiest thing in terms of learning curve, especially for those not deeply immersed in the platform. Sticking with code generation onto simpler basic frameworks can produce verbose code, but it can often be quite simple and easy to read.
Of course, where you have a lot of variance and extremely rich parameterization in your generator, you might just be trading off complexity an your product for complexity in your templates. This is an easy path to slide into and can make maintenance just as much of a headache - watch out for that.

Generating code isn't evil and it doesn't smell! The key is to generate the right code at the right time. I think T4 is great--I only use it occasionally, but when I do it is very helpful. To say, unconditionally, that generating code is bad is unconditionally crazy!

It seems to me code generators are fine as long as the code generation is part of your normal build process, rather than something you run once and then keep its output. I add this caveat because if just use the code generator once and discard the data that created it, you're just automatically creating a massive DRY violation and maintenance headache; whereas generating the code every time effectively means that whatever you are using to do the generating is the real source code, and the generated files are just intermediate compile stages that you should mostly ignore.
Lex and yacc are classic examples of tools of allow you to specify functionality in an efficient manner and generate efficient code from it. Trying to do their jobs by hand will lengthen your development time and probably produce less efficient and less readable code. And while you could certainly incorporate something like lex and yacc directly into your code and do their jobs at run time instead of at compile time, that would certainly add considerable complexity to your code and slow it down. If you actually need to change your specification at run time it might be worth it, but in most normal cases using lex/yacc to generate code for you at compile time is a big win.

A good percentage of what is in Visual Studio 2010 would not be possible without code generation. Entity Framework would not be possible. The simple act of dragging and dropping a control onto a form would not be possible, nor would Linq. To say that code generation should not be used is strange as so many use it without even thinking about it.

Maybe it is a bit harsh, but for me code generation smells.
That code generation is used means that there are numerous underlying common principles which may be expressed in a "Don't repeat yourself" fashion. It may take a bit longer, but it is satisfying when you end up with classes that only contain the bits that really change, based on an infrastructure that contains the mechanics.
As to Generics...no I don't have too many issues with it. The only thing that currently doesn't work is saying that
List<Animal> a = new List<Animal>();
List<object> o = a;
But even that will be possible in the next version of C#.

Code generation is for me a workaround for many problems found in language, frameworks, etc. They are not evil by themselves, I would say it is very very bad (i.e. evil) to release a language (C#) and framework which forces you to copy&paste (swap on properties, events triggering, lack of macros) or use magical numbers (wpf binding).
So, I cry, but I use them, because I have to.

I've used T4 for code generation and also Generics. Both are good, have their pros and cons, and are suited for different purposes.
In my case, I use T4 to generate Entities, DAL and BLL based on a database schema. However, DAL and BLL reference a mini-ORM I built, based on Generics and Reflection. So I think you can use them side by side, as long as you keep in control and keep it small and simple.
T4 generates static code, while Generics is dynamic. If you use Generics, you use Reflection which is said to be less performant than "hard-coded" solution. Of course you can cache reflection results.
Regarding "return new T();", I use Dynamic Methods like this:
public class ObjectCreateMethod
{
delegate object MethodInvoker();
MethodInvoker methodHandler = null;
public ObjectCreateMethod(Type type)
{
CreateMethod(type.GetConstructor(Type.EmptyTypes));
}
public ObjectCreateMethod(ConstructorInfo target)
{
CreateMethod(target);
}
void CreateMethod(ConstructorInfo target)
{
DynamicMethod dynamic = new DynamicMethod(string.Empty,
typeof(object),
new Type[0],
target.DeclaringType);
ILGenerator il = dynamic.GetILGenerator();
il.DeclareLocal(target.DeclaringType);
il.Emit(OpCodes.Newobj, target);
il.Emit(OpCodes.Stloc_0);
il.Emit(OpCodes.Ldloc_0);
il.Emit(OpCodes.Ret);
methodHandler = (MethodInvoker)dynamic.CreateDelegate(typeof(MethodInvoker));
}
public object CreateInstance()
{
return methodHandler();
}
}
Then, I call it like this:
ObjectCreateMethod _MetodoDinamico = new ObjectCreateMethod(info.PropertyType);
object _nuevaEntidad = _MetodoDinamico.CreateInstance();

More code means more complexity. More complexity means more places for bugs to hide, which means longer fix cycles, which in turn means higher costs throughout the project.
Whenever possible, I prefer to minimize the amount of code to provide equivalent functionality; ideally using dynamic (programmatic) approaches rather than code generation. Reflection, attributes, aspects and generics provide lots of options for a DRY strategy, leaving generation as a last resort.

Generics and code generation are two different things. In some cases you could use generics instead of code generation and for those I believe you should. For the other cases code generation is a powerful tool.
For all the cases where you simply need to generate code based on some data input, code generation is the way to go. The most obvious, but by no means the only example is the forms editor in Visual Studio. Here the input is the designer data and the output is the code. In this case generics is really no help at all, but it is very nice that VS simply generates the code based on the GUI layout.

Code generators could be considered a code smell that indicate a flaw or lack of functionality in the target langauge.
For example, while it has been said here that "Objects that persist can not be generalized", it would be better to think of it as "Objects in C# that automatically persist their data can not be generalized in C#", because I surely can in Python through the use of various methods.
The Python approach could, however, be emulated in static languages through the use of operator[ ](method_name as string), which either returns a functor or a string, depending on requirements. Unfortunately that solution is not always applicable, and returning a functor can be inconvenient.
The point I am making is that code generators indicate a flaw in a chosen language that are addressed by providing a more convenient specialised syntax for the specific problem at hand.

The copy/paste type of generated code (like ORMs make) can also be very useful...
You can create your database, and then having the ORM generate a copy of that database definition expressed in your favorite language.
The advantage comes when you change your original definition (the database), press compile and the ORM (if you have a good one) can re-generates your copy of the definition. Now all references to your database can be checked by the compilers type checker and your code will fail to compile when you're using tables or columns that do not exist anymore.
Think about this: If I call a method a few times in my code, am I not referring to the name I gave to this method originally? I keep repeating that name over and over... Language designers recognized this problem and came up with "Type-safety" as the solution. Not removing the copies (as DRY suggests we should do), but checking them for correctness instead.
The ORM generated code brings the same solution when referring to table and column names. Not removing the copies/references, but bringing the database definition into your (type-safe) language where you can refer to classes and properties instead. Together with the compilers type checking, this solves a similar problem in a similar way: Guarantee compile-time errors instead of runtime ones when you refer to outdated or misspelled tables (classes) or columns (properties).

quote:
I have not really found effective ways to build templates that can say for instance instantiate themselves. In otherwords I can never do :
return new T();
public abstract class MehBase<TSelf, TParam1, TParam2>
where TSelf : MehBase<TSelf, TParam1, TParam2>, new()
{
public static TSelf CreateOne()
{
return new TSelf();
}
}
public class Meh<TParam1, TParam2> : MehBase<Meh<TParam1, TParam2>, TParam1, TParam2>
{
public void Proof()
{
Meh<TParam1, TParam2> instanceOfSelf1 = Meh<TParam1, TParam2>.CreateOne();
Meh<int, string> instanceOfSelf2 = Meh<int, string>.CreateOne();
}
}

Why does being able to copy/paste really, really fast, make it any more acceptable?
That's the only justification for code generation that I can see.
Even if the generator provides all the flexibility you need, you still have to learn how to use that flexibility - which is yet another layer of learning and testing required.
And even if it runs in zero time, it still bloats the code.
I rolled my own data access class. It knows everything about connections, transactions, stored procedure parms, etc, etc, and I only had to write all the ADO.NET stuff once.
It's now been so long since I had to write (or even look at) anything with a connection object in it, that I'd be hard pressed to remember the syntax offhand.

Code generation, like generics, templates, and other such shortcuts, is a powerful tool. And as with most powerful tools, it amplifies the capaility of its user for good and for evil - they can't be separated.
So if you understand your code generator thoroughly, anticipate everything it will produce, and why, and intend it to do so for valid reasons, then have at it. But don't use it (or any of the other technique) to get you past a place where you're not to sure where you're headed, or how to get there.
Some people think that, if you get your current problem solved and some behavior implemented, you're golden. It's not always obvious how much cruft and opaqueness you leave in your trail for the next developer (which might be yourself.)

Related

Why we need Reflection at all?

I was studying Reflection, I got some of it but I am not getting everything related to this concept. Why do we need Reflection? What things we couldn't achieve that we need Reflection?

There are many, many scenarios that reflection enables, but I group them primarily into two buckets.
Reflection enables us to write code that analyzes other code.
Consider for example the most basic question about an assembly: what types are in it? Assemblies are self-describing and reflection is the mechanism by which that description is surfaced to other code.
Suppose for example you wanted to write a program which took an assembly and did a graphical display of the relationships between the various classes in that assembly, to help you understand that code. There are such tools. They're in Visual Studio. Someone wrote those tools. They did not appear by magic. Reflection is the mechanism designed into the .NET framework that enables you or me or anyone else to write tools that understand code.
Reflection enables us to move compile time bindings to runtime.
Suppose you have a static method Foo.Bar(). When you put a call to Foo.Bar() in your program, you know with 100% certainty that the method you think is going to be called is actually going to be called. We call static methods "static" because the binding from the name Bar to the code that gets called can be understood statically -- that is, without running the program.
Now consider a virtual method Blah() on a base class. When you call whatever.Blah() you don't know exactly which Blah() will be called at compile time, but you know that some method Blah() with no arguments will be called on some type that is the runtime type of whatever, and that type is equal to or derived from the type which declares Blah(). (In fact you know more: you know that it is equal to or derived from the compile time type of whatever.) Virtual binding is a form of dynamic binding, but it is not fully dynamic. There's no way for the user to decide that this call should be to a different method on a different type hierarchy.
Reflection enables us to make calls that are bound entirely at runtime, based entirely on user choices if we like. We pay a performance penalty, and we lose compile-time type safety, but we gain the flexibility to decide 100% at runtime what code we call. There are scenarios where that's a reasonable tradeoff.

Reflection is such a deep part of the .NET framework that you often don't know that you're doing it (see Attributes and LINQ for instance). And when you do know you're doing it, even if it feels wrong, it might be the only way to achieve a particular objective.
Apart from the two broad areas that Eric mentioned here are a few others. There are lots more, these are just some that come to mind immediately.
Serialization (and similar)
Whether you're using XML or JSON or rolling your own, serializing objects is much easier when you don't have to write specific code for each class to enable serialization. Reflection enables you to enumerate the properties in your object that have been flagged for (or not flagged against) serailization and write them to the output.
This isn't about saving state though. Reflection allows us to write generic methods that can produce business output too, like CSV or XLSX files from an arbitrary collection. I get a lot of mileage out of my ToCSV(...) and ToExcel(...) extensions for things like producing downloadable versions of data sets on my web-based reporting.
Accessing Hidden Data
Yes, I know, this is a dodgy one. And yeah, Eric is probably going to slap me for this, but...
There's a lot of code out there - I'm looking at you, ASP.NET - that hides interesting and useful stuff behind private or protected. Sometimes the only way to get them out is to use reflection. Sometimes it's not the only way, but it can be the simpler way.
Attributes
Every time you tag an Attribute onto one of your classes, methods, etc. you are implicitly providing data that is going to be accessed through reflection. Want to use those attributes yourself? Reflection is the only way you can get at them.
LINQ and Other Expressions
This is really important stuff these days. If you've ever used LINQ to SQL, Entity Frameworks, etc. then you've used Expression in some way. You write a simple little POCO to represent a row in your database table and everything else gets handled by reflection. When you write a predicate expression the system is using the reflection model to build structures that are then processed (visited) to build an SQL statement.
Expressions aren't just for LINQ either, you do some really interesting things yourself, once you know what you're doing. I have code to generate line parsers for CSV import that run pretty damn quickly when compiled to Func<string, TRecord>. These days I tend to use a mapper somebody else wrote, but at the time I needed to slice a few more % off the total import time for a file with 20K records that was uploaded to a website periodically.
P/Invoke Marshalling
This one is a big deal behind the scenes and occasionally in the foreground too. When you want to call a Windows API function or use a native DLL, P/Invoke gives you ways to achieve this without having to mess about with building memory buffers in both directions. The marshalling methods use reflection to do translation of certain things - strings and so on being the obvious example - so that you don't have to get your hands dirty. All based on the Type object that is the foundation of reflection.
Fact is, without reflection the .NET framework wouldn't be what it is. No Attributes, no Expressions, probably a lot less interop between the languages. No automatic marshalling. No LINQ... at least in the way we often use it now.

Understanding the various options for runtime code generation in C# (Roslyn, CodeDom, Linq Expressions, ...?)

I'm working on an application where I'd like to dynamically generate code for a numerical calculation (for performance). Doing this calculation as a data driven operation is too slow. To describe my requirements, consider this class:
class Simulation
{
Dictionary<string, double> nodes;
double t, dt;
private void ProcessOneSample()
{
t += dt;
// Expensive operation that computes the state of nodes at the current t.
}
public void Process(int N, IDictionary<string, double[]> Input, IDictionary<string, double[]> Output)
{
for (int i = 0; i < N; ++i)
{
foreach (KeyValuePair<string, double[]> j in Input)
nodes[j.Key] = j.Value[i];
ProcessOneSample();
foreach (KeyValuePair<string, double[]> j in Output)
j.Value[i] = nodes[j.Key];
}
}
}
What I want to do is JIT compile a function that implements the outer loop in Process. The code that defines this function will be generated by the data that is currently used to implement ProcessOneSample. To clarify what I'm expecting, I'd expect all of the dictionary lookups to be performed once in the compilation process (i.e. the JIT compile would bind directly to the relevant object in the dictionary), so that when the compiled code is actually executed, it is as if all of the lookups had been hardcoded.
What I'm trying to figure out is what the best tools are to tackle this problem. The reason I'm asking this question is because there are so many options:
Use Roslyn. Current stumbling block is how to bind expressions in the syntax to member variables from the host program (i.e. the values in the 'state' dictionary). Is this possible?
Use LINQ Expressions (Expression.Compile).
Use CodeDom. Just recently became aware of this in my google searching, and what prompted this question. I'm not too stoked on stumbling my way through a third compilation framework in .Net.
My original plan before I knew any of these tools existed was to call native code (x86) that I JIT compiled myself. I have some experience with this, but there are a lot of unknowns here that I have not solved yet. This is also my backup option if the performance of the above options is not sufficient. I'd prefer one of the above 3 solutions because I am sure they will be much, much simpler, assuming I can get one of them to work!
Does anyone have any experience with something like this that they would be able to share?

I'm not sure I understand your example, nor that code generation is hte best way to improve it's performance.
But if you want to understand the code generation options, first consider your requirements. Performance is what you want, but there is the performance of the code generation, and the performance of the generated code. These are definitelly not the same thing. Then there is the writability and readability of your code. Different options have very different scores on this one.
Your first option is Reflection.Emit, especially DynamicMethod. Reflection.Emit is a pretty low level API, and is pretty efficient (i.e. the code generation has good performance). Furthermore, because you have complete control of the code being generated, you have the potential to generate the most efficient code (or to generate very bad code, obviously). Also, you are not restricted to what a language such as C# allows you to do, the full power of the CLR is at your fingertips. The biggest problem with Reflection.Emit is the large volume of code you need to write, and the deep knowledge of IL required to do so. Writing that code is not easy, nor is afterwards reading or maintaining it.
Linq.Expressions, more specifically the Compile method provide a nice alternative. You can think of this as being essentially a type-safe wrapper around DynamicMethod generation with Reflection.Emit. There is some overhead in generating the code, which would probably not be a big problem. As for freedom of expression, you can do pretty much everything you can do in a normal C# method. You do not have complete control over the generated code, but the quality is generally very good. The biggest advantage of this approach is that it is much easier to write and read a program using this technique.
As for Roslyn, you have the option of generating a syntax tree, or generating C# (or VB) and have it parsed into a syntax tree to be compiled. It is way to early to guess what the performance might be, as we do not have production code available (at the time of writing). Obviously, parsing a syntax tree will have some overhead, and if you are generating a single method Roslyn's ability to generate multiple methods in paralle won't help a lot. Using Roslyn has the potential that it allows for very readable programs though.
As for CodeDom, I would recommend against it. This is a very old API, that (in the current implementation) launches a CSC.exe process to compile your code. I also believe that it does not support the complete C# language.

Tool for 'Flattening' (simplifying) C# Source

I need to provide a copy of the source code to a third party, but given it's a nifty extensible framework that could be easily repurposed, I'd rather provide a less OO version (a 'procedural' version for want of a better term) that would allow minor tweaks to values etc but not reimplementation using the full flexibility of how it is currently structured.
The code makes use of the usual stuff: classes, constructors, etc. Is there a tool or method for 'simplifying' this into what is still the 'source' but using only plain variables etc.
For example, if I had a class instance 'myclass' which initialised this.blah in the constructor, the same could be done with a variable called myclass_blah which would then be manipulated in a more 'flat' way. I realise some things like polymorphism would probably not be possible in such a situation. Perhaps an obfuscator, set to a 'super mild' setting would achieve it?
Thanks

My experience with nifty extensible frameworks has been that most shops have their own nifty extensible frameworks (usually more than one) and are not likely to steal them from vendor-provided source code. If you are under obligation to provide source code (due to some business relationship), then, at least in my mind, there's an ethical obligation to provide the actual source code, in a maintainable form. How you protect the source code is a legal matter and I can't offer legal advice, but really you should be including some license with your release and dealing with clients who are not going to outright steal your IP (assuming it's actually yours under the terms you're developing it.)

As had already been said, if this is a requirement based on restrictions of contracts then don't do it. In short, providing a version of the source that differs from what they're actually running becomes a liability and I doubt that it is one that your company should be willing to take. Proving that the code provided matches the code they are running is simple. This is also true if you're trying to avoid license restrictions of libraries your application uses (e.g. GPL).
If that isn't the case then why not provide a limited version of your extensibility framework that only works with internal types and statically compile any required extensions in your application? This will allow the application to continue to function as what they currently run while remaining maintainable without giving up your sacred framework. I've never done it myself but this sounds like something ILMerge could help with.

If you don't want to give out framework - just don't. Provide only source you think is required. Otherwise most likely you'll need to either support both versions in the future OR never work/interact with these people (and people they know) again.
Don't forget that non-obfuscated .Net assemblies have IL in easily de-compilable form. It is often easier to use ILSpy/Reflector to read someone else code than looking at sources.
If the reason to provide code is some sort of inspection (even simply looking at the code) you'd better have semi-decent code. I would seriously consider throwing away tool if its code looks written in FORTRAN-style using C# ( http://www.nikhef.nl/~templon/fortran/fortran_style ).
Side note: I believe "nifty extensible frameworks" are one of the roots of "not invented here" syndrome - I'd be more worried about comments on the framework (like "this code is ##### because it does not use YYY pattern and spacing is wrong") than reuse.

Automating complex refactoring tasks

I have the situation that the same repeating refactoring tasks have to be done for a huge number of methods in my code.
For example imagine a interface with 100 methods, each of them has one or more parameters as well as a return value. For each of these methods I need to jump to the implementation change the return type and add a line of code which converts the old return value to its new type for callers of the interface method.
Is there any way to quickly automate such refactorings?
I even thought to write a custom script to do it, but writing a intelligent script would approximately take longer than doing it maually.
A tool supporting such task can save a lot of time.

It's a good question, but in the time it took since you posted it (not to mention the time you spent searching for an answer before posting), you could have completed the changes manually.
I know, I know, it's utterly unsatisfying, but if you think of it as a form of mediation, and only do this once a year, it's not that bad.

If your problem is one interface with 100 methods, then I agree with another poster: just doing it may seem painful but it is limited in effort and you can be done really soon.
If you have this problem repeatedly, or you have very large code base (many, many interfaces for which you want to perform this task), then what you need is a tool for implementing automated change: a program transformation engine. Such a tool provides the ability to parse source code, build a program representation (an abstract syntax tree), and enables one to apply "scripted" operations on the tree either through procedural interfaces and/or through source-to-source transformation patterns.
OUr DMS Software Reengineering Toolkit is such a program transformation system. It has a C# Front End to enable its application to C# code. Configuring such a tool for a complex task is not a matter of hours, so it is not useful for "small scale" changes. For large scale changes, such tools can make it possible to do things simply not practical by hand.

Resharper and CodeRush both have features which can help with this kind of task.
Resharper's change signature functionality is probably the closest match.

Can't you generate a new interface from the class you have and then remove the ones you don't need! if it's that simple!!

change the return type : by changing... the return type, provided it is not a standard type (...), and the converter can be implemented by a TypeConverter.
When i have such boring task to do, i often switch VS2010 and use a tool that allow regex search and replace. In your example, maybe change 'return xxx;' by 'var yyy=convert(xxx); return yyy;'
(for example editor Notepad++ (free) allready offers quite some possiblities to change everything in a project (use with caution))

In C# (or any language) what is/are your favourite way of removing repetition?

I've just coded a 700 line class. Awful. I hang my head in shame. It's as opposite to DRY as a British summer.
It's full of cut and paste with minor tweaks here and there. This makes it's a prime candidate for refactoring. Before I embark on this, I'd thought I'd ask when you have lots of repetition, what are the first refactoring opportunities you look for?
For the record, mine are probably using:
Generic classes and methods
Method overloading/chaining.
What are yours?

I like to start refactoring when I need to, rather than the first opportunity that I get. You might say this is somewhat of an agile approach to refactoring. When do I feel I need to? Usually when I feel that the ugly parts of my codes are starting to spread. I think ugliness is okay as long as they are contained, but the moment when they start having the urge to spread, that's when you need to take care of business.
The techniques you use for refactoring should start with the simplest. I would strongly recommand Martin Fowler's book. Combining common code into functions, removing unneeded variables, and other simple techniques gets you a lot of mileage. For list operations, I prefer using functional programming idioms. That is to say, I use internal iterators, map, filter and reduce(in python speak, there are corresponding things in ruby, lisp and haskell) whenever I can, this makes code a lot shorter and more self-contained.

#region
I made a 1,000 line class only one line with it!
In all seriousness, the best way to avoid repetition is the things covered in your list, as well as fully utilizing polymorphism, examine your class and discover what would best be done in a base class, and how different components of it can be broken away a subclasses.

Sometimes by the time you "complete functionality" using copy and paste code, you've come to a point that it is maimed and mangled enough that any attempt at refactoring will actually take much, much longer than refactoring it at the point where it was obvious.
In my personal experience my favorite "way of removing repetition" has been the "Extract Method" functionality of Resharper (although this is also available in vanilla Visual Studio).
Many times I would see repeated code (some legacy app I'm maintaining) not as whole methods but in chunks within completely separate methods. That gives a perfect opportunity to turn those chunks into methods.
Monster classes also tend to reveal that they contain more than one functionality. That in turn becomes an opportunity to separate each distinct functionality into its own (hopefully smaller) class.
I have to reiterate that doing all of these is not a pleasurable experience at all (for me), so I really would rather do it right while it's a small ball of mud, rather than let the big ball of mud roll and then try to fix that.

First of all, I would recommend refactoring much sooner than when you are done with the first version of the class. Anytime you see duplication, eliminate it ASAP. This may take a little longer initially, but I think the results end up being a lot cleaner, and it helps you rethink your code as you go to ensure you are doing things right.
As for my favorite way of removing duplication.... Closures, especially in my favorite language (Ruby). They tend to be a really concise way of taking 2 pieces of code and merging the similarities. Of course (like any "best practice" or tip), this can not be blindly done... I just find them really fun to use when I can use them.

One of the things I do, is try to make small and simple methods that I can see on a single page in my editor (visual studio).
I've learnt from experience that making code simple makes it easier for the compiler to optimise it. The larger the method, the harder the compiler has to work!
I've also recently seen a problem where large methods have caused a memory leak. Basically I had a loop very much like the following:
while (true)
{
var smallObject = WaitForSomethingToTurnUp();
var largeObject = DoSomethingWithSmallObject();
}
I was finding that my application was keeping a large amount of data in memory because even though 'largeObject' wasn't in scope until smallObject returned something, the garbage collector could still see it.
I easily solved this by moving the 'DoSomethingWithSmallObject()' and other associated code to another method.
Also, if you make small methods, your reuse within a class will become significantly higher. I generally try to make sure that none of my methods look like any others!
Hope this helps.
Nick

"cut and paste with minor tweaks here and there" is the kind of code repetition I usually solve with an entirely non-exotic approach- Take the similar chunk of code, extract it out to a seperate method. The little bit that is different in every instance of that block of code, change that to a parameter.
There's also some easy techniques for removing repetitive-looking if/else if and switch blocks, courtesy of Scott Hanselman:
http://www.hanselman.com/blog/CategoryView.aspx?category=Source+Code&page=2

I might go something like this:
Create custom (private) types for data structures and put all the related logic in there. Dictionary<string, List<int>> etc.
Make inner functions or properties that guarantee behaviour. If you’re continually checking conditions from a publically accessible property then create an private getter method with all of the checking baked in.
Split methods apart that have too much going on. If you can’t put something succinct into the or give it a good name, then start breaking the function apart until the code is (even if these “child” functions aren’t used anywhere else).
If all else fails, slap a [SuppressMessage("Microsoft.Maintainability", "CA1502:AvoidExcessiveComplexity")] on it and comment why.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.