I often find myself in a situation where i am repeating two,three lines of code in a method multiple times and then think whether i should put that in a separate method to avoid code duplication. But then when i move those lines out of the method i find that the method just created is not reusable, was used only one time or requires an overload to be useful for another method.
My Question is what sort of patterns should we be looking for that indicate we should create a new method. I appreciate your response.
Don't put too much functionality in one method/class. Try to follow the single responsibility principle. It'll take some time getting familiar with that approach. But once you reach that level, you'll notice that it's done all by itself. Before coding, try to ask yourself, what functional units your concept includes.
For example, you want to develop an application, that can index the content of pdf files. It's just fictional, but at first sight, I could identify at least three components:
PdfParser - this provides you with the content of a pdf
Indexer - gets input from parser and counts meaningful words
Repository - it's for persistence; this could be made generic; so just say repository.Get<IndexData>(filename) or something
You should also try to code against interfaces. Especially when some kind of UI is involved. For example, you are developing a chat client with WinForms. If you follow the MVC/MVVM-pattern, you can easily (i.e., easier than coding against a Form object) use your original logic with a WPF version of the client.
I would start by reading about the DRY principle (Don't Repeat Yourself) hopefully it will give you a good answer for your question, which is a question that all developers should be asking themselves by the way, great question!!
See Don't repeat yourself
I wanted to leave it at DRY because it is such a simple but powerful concept that will need some reading and a lot of practice to get good add. But let me try to answer directly to your question (IMHO),
If you can't give your method a name that reflects exactly what your method is doing, break it into pieces that have meaning.
You'll find yourself DRYing up your code with ease, reusable pieces will show up, and you probably will never find yourself repeating code.
I would do this even if it meant having methods with only couple of lines of code.
Following this practice will give meaning to your code, make it readable and predictable, and definitely more reusable
If the lines of code that you intend to move to another method perform a specific set of actions (like read a file, calculate a value, etc.) then it is best to refactor into another helper method. Again, do this only if the helper method is being called at several places in your code or if your caller method is too long (definition of too long depends on the developer).
Similar questions
How do programmers practice code reuse
What techniques do you use to maximise code reuse?
Code Reusability: Is it worth it?
Coding Priorities: Performance, Maintainability, Reusability?
As a general rule, always think of those situations as functional entities. If a piece of code functionally performs a task (complex string conversion, parsing, etc), you should write reusable method.
If that function is specific to a certain type, then write an extension method.
You could create a local variable inside your function of type Action<> or Func<> and assign the code snippet to it. Then you can use it everywhere inside your function without polluting your class with too many little helper functions.
If you build a method for reusability, but don't use it in more than one place, then the reusability of you method isn't really verified.
Extract methods when it makes sense, and redesign those methods for reusability when you actually have the opportunity to reuse code.
Related
I've looked at both the Named Parameter Idiom and the Boost::Parameter library. What advantages does each one have over the other? Is there a good reason to always choose one over the other, or might each of them be better than the other in some situations (and if so, what situations)?
Implementing the Named Parameter Idiom is really easy, almost about as easy as using Boost::Parameter, so it kind of boils down to one main point.
-Do you already have boost dependencies? If you don't, Boost::parameter isn't special enough to merit adding the dependency.
Personally I've never seen Boost::parameter in production code, 100% of the time its been a custom implementation of Named Parameters, but that's not necessarily a good thing.
Normally, I'm a big fan of Boost, but I wouldn't use the Boost.Parameter library for a couple of reasons:
If you don't know what's going on,
the call looks like you're assigning
a value to a variable in the scope
on the calling function before
making the call. That can be very
confusing.
There is too much boilerplate code necessary to set it up in the first place.
Another point, while I have never used Named Parameter Idiom, I have used Boost Parameter for defining up to 20 optional arguments. And, my compile times are insane. What used to take a couple seconds, now takes 30sec. This adds up if you have a library of stuff that use your one little application that you wrote using boost parameter. Of course, I might be implementing it wrongly, but I hope this changes, because other than that, i really like it.
The Named Parameter idiom is a LOT simpler. I can't see (right now) why we would need the complexity of the Boost::Parameter library. (Even the supposed "feature" Deduced parameters, seems like a way to introduce coding errors ;) )
You probably don't want Boost.Parameter for general application logic so much as you would want it for library code that you are developing where it can be quite a time saver for clients of the library.
Never heard of either, but reviewing the links, named parameter is WAY easier and more obvious to understand. I'd pick it in a heartbeat over the boost implementation.
I'm pretty new to C# so bear with me.
One of the first things I noticed about C# is that many of the classes are static method heavy. For example...
Why is it:
Array.ForEach(arr, proc)
instead of:
arr.ForEach(proc)
And why is it:
Array.Sort(arr)
instead of:
arr.Sort()
Feel free to point me to some FAQ on the net. If a detailed answer is in some book somewhere, I'd welcome a pointer to that as well. I'm looking for the definitive answer on this, but your speculation is welcome.
Because those are utility classes. The class construction is just a way to group them together, considering there are no free functions in C#.
Assuming this answer is correct, instance methods require additional space in a "method table." Making array methods static may have been an early space-saving decision.
This, along with avoiding the this pointer check that Amitd references, could provide significant performance gains for something as ubiquitous as arrays.
Also see this rule from FXCOP
CA1822: Mark members as static
Rule Description
Members that do not access instance data or call instance methods can
be marked as static (Shared in Visual Basic). After you mark the
methods as static, the compiler will emit nonvirtual call sites to
these members. Emitting nonvirtual call sites will prevent a check at
runtime for each call that makes sure that the current object pointer
is non-null. This can achieve a measurable performance gain for
performance-sensitive code. In some cases, the failure to access the
current object instance represents a correctness issue.
Perceived functionality.
"Utility" functions are unlike much of the functionality OO is meant to target.
Think about the case with collections, I/O, math and just about all utility.
With OO you generally model your domain. None of those things really fit in your domain--it's not like you are coding and go "Oh, we need to order a new hashtable, ours is getting full". Utility stuff often just doesn't fit.
We get pretty close, but it's still not very OO to pass around collections (where is your business logic? where do you put the methods that manipulate your collection and that other little piece or two of data you are always passing around with it?)
Same with numbers and math. It's kind of tough to have Integer.sqrt() and Long.sqrt() and Float.sqrt()--it just doesn't make sense, nor does "new Math().sqrt()". There are a lot of areas it just doesn't model well. If you are looking for mathematical modeling then OO may not be your best bet. (I made a pretty complete "Complex" and "Matrix" class in Java and made them fairly OO, but making them really taught me some of the limits of OO and Java--I ended up "Using" the classes from Groovy mostly)
I've never seen anything anywhere NEAR as good as OO for modeling your business logic, being able to demonstrate the connections between code and managing your relationship between data and code though.
So we fall back on a different model when it makes more sense to do so.
The classic motivations against static:
Hard to test
Not thread-safe
Increases code size in memory
1) C# has several tools available that make testing static methods relatively easy. A comparison of C# mocking tools, some of which support static mocking: https://stackoverflow.com/questions/64242/rhino-mocks-typemock-moq-or-nmock-which-one-do-you-use-and-why
2) There are well-known, performant ways to do static object creation/logic without losing thread safety in C#. For example implementing the Singleton pattern with a static class in C# (you can jump to the fifth method if the inadequate options bore you): http://www.yoda.arachsys.com/csharp/singleton.html
3) As #K-ballo mentions, every method contributes to code size in memory in C#, rather than instance methods getting special treatment.
That said, the 2 specific examples you pointed out are just a problem of legacy code support for the static Array class before generics and some other code sugar was introduced back in C# 1.0 days, as #Inerdia said. I tried to answer assuming you had more code you were referring to, possibly including outside libraries.
The Array class isn't generic and can't be made fully generic because this would break backwards compatibility. There's some sort of magic going on where arrays implement IList<T>, but that's only for single-dimension arrays with a lower bound of 0 – "list-ish" arrays.
I'm guessing the static methods are the only way to add generic methods that work over any shape of array regardless of whether it qualifies for the above-mentioned compiler magic.
I have a method that is over 700+ lines long. In the beginning of the method, there are around 50 local variables declared. I decided to take the local variables out and put them into a separate class as properties so I could just declare the class in the method and use the properties through out it. Is this perfectly fine or does another data type fit in here such as a struct? This method was written during classic ASP times.
I have a method that is over 700+ lines long. In the beginning of the method, there are around 50 local variables declared.
Ok, so, the length of that method is also a problem. 700 lines is just too much to keep straight in one normal person's head all at once. When you have to fix a bug in there you end up scrolling up and down and up and down and... you get the idea. It really makes things hard to maintain.
So my answer is, yes, you should likely split up your data into a structure of some sort assuming that it actually makes sense to do so (i.e., I probably wouldn't create a SomeMethodParmaters class). The next thing to do is to split that method out into smaller pieces. You may even find that you no longer need a data structure as now each method only has a handful of variables declared for the work it needs to do.
Also, this is subjective, but there is really no good reason to declare all variables at the top of the method. Try declaring them as close to when they are actually used as possible. Again, this just keeps things nice and clean for maintenance in the future. It's much easier to concentrate on one section of code when you can see it all on the screen.
Hrm... I think you'd probably be better off refactoring the method to not have to operate on so many variables at all. For instance, five methods operating on ten variables each would infinitely better. As it stands now, it feels like you're simply trying to mask an issue rather than solve it.
I would strongly recommend you take a read through this book and/or any number of web sites concerned with refactoring. http://www.amazon.com/Refactoring-Improving-Design-Existing-Code/dp/0201485672
Although you can't look at 700+ lines in a single method and automatically say that is bad, it does indicate a bad code smell. Methods should be small units of code with a single purpose. This makes it easier for you to maintain or those who come behind you. It can also help you to figure out improvements to your design and make altering your design in the future much easier.
Creating a class just to hold properties without looking at what the overall structure should be is just hiding a problem. That is not to say in this particular instance that is not a perfectly acceptable and correct solution, just that you should make sure you are taking the time to provide a properly thought out design where your classes have the properties, state, and functionality they deserve.
Hope this helps.
The project I'm working on has just hit 4200 lines in the main C# file, which is causing IntelliSense to take a few seconds (sometimes up to 6 or so) to respond, during which Visual Studio locks up. I'm wondering how everyone else splits their files and whether there's a consensus.
I tried to look for some guides and found Google's C++ guide, but I couldn't see anything about semantics such as function sizes and file sizes; maybe it's there - I haven't looked at it for a while.
So how do you split your files? Do you group your methods by the functions they serve? By types (event handlers, private/public)? And at what size limit do you split functions?
To clarify, the application in question handles data - so its interface is a big-ass grid, and everything revolves around the grid. It has a few dialogs forms for management, but it's all about the data. The reason why it's so big is that there is a lot of error checking, event handling, and also the grid set up as master-detail with three more grids for each row (but these load on master row expanded). I hope this helps to clarify what I'm on about.
I think your problem is summed up with the term you use: "Main C# file".
Unless you mean main (as in the method main()) there is no place for that concept.
If you have a catch-all utility class or other common methods you should break them into similar functional parts.
Typically my files are just one-to-one mappings of classes.
Sometimes classes that are very related are in the same file.
If your file is too large it is an indication your class is too big and too general.
I try to keep my methods to half a screen or less. (When it is code I write from scratch it is usually 12 lines or fewer, but lately I have been working in existing code from other developers and having to refactor 100 line functions...)
Sometimes it is a screen, but that is getting very large.
EDIT:
To address your size limit question about functions - for me it is less about size (though that is a good indicator of a problem) and more about doing only one thing and keeping each one SIMPLE.
In the classic book "Structured Programming" Dijkstra once wrote a section entitled: "On our inability to do much." His point was simple. Humans aren't very smart. We can't juggle more than a few concepts in our minds at one time.
It is very important to keep your classes and methods small. When a method gets above a dozen lines or so, it should be broken apart. When a class gets above a couple of hundred lines, it should be broken apart. This is the only way to keep code well organized and manageable. I've been programming for nearly 40 years, and with every year that has gone by, I realize just how important the word "small" is when writing software.
As to how you do this, this is a very large topic that has been written about many different times. It's all about dependency management, information hiding, and object-oriented design in general. Here is a reading list.
Clean Code
SOLID
Agile Principles, Patterns, and Pratices in C#
Split your types where it's natural to split them - but watch out for types that are doing too much. At about 500 lines (of Java or C#) I get concerned. At about 1000 lines I start looking hard at whether the type should be split up... but sometimes it just can't/shouldn't be.
As for methods: I don't like it when I can't see the whole method on the screen at a time. Obviously that depends on size of monitor etc, but it's a reasonable rule of thumb. I prefer them to be shorter though. Again, there are exceptions - some logic is really hard to disentangle, particularly if there are lots of local variables which don't naturally want to be encapsulated together.
Sometimes it makes sense for a single type to have a lot of methods - such as System.Linq.Enumerable but partial classes can help in such cases, if you can break the type up into logical groups (in the case of Enumerable, grouping by aggregation / set operations / filtering etc would seem natural). Such cases are rare in my experience though.
Martin Fowler's book Refactoring I think gives you a good starting point for this. It instructs on how to identify "code smells" and how to refactor your code to fix these "smells." The natural result (although it's not the primary goal) is that you end up with smaller more maintainable classes.
EDIT
In light of your edit, I have always insisted that good coding practice for back-end code is the same in the presentation tier. Some very useful patterns to consider for UI refactorings are Command, Strategy, Specification, and State.
In brief, your view should only have code in it to handle events and assign values. All logic should be separated into another class. Once you do this, you'll find that it becomes more obvious where you can refactor. Grids make this a little more difficult because they make it too easy to split your presentation state between the presentation logic and the view, but with some work, you can put in indirection to minimize the pain caused by this.
Don't code procedurally, and you won't end up with 4,200 lines in one file.
In C# it's a good idea to adhere to some SOLID object-oriented design principles. Every class should have one and only one reason to change. The main method should simply launch the starting point for the application (and configure your dependency injection container, if you're using something like StructureMap).
I generally don't have files with more than 200 lines of code, and I prefer them if they're under 100.
There are no hard and fast rules, but there's a general agreement that more, shorter functions are better than a single big function, and more smaller classes are better than 1 big class.
Functions bigger than 40 lines or so should make you consider how you can break it up. Especially look at nested loops, which are confusing and often easy to translate to function calls with nice descriptive names.
I break up classes when I feel like they do more than 1 thing, like mix presentation and logic. A big class is less of a problem than a big method, as long as the class does 1 thing.
The consensus in style guides I've seen is to group methods by access, with constructors and public methods on the top. Anything consistent is great.
You should read up on C# style and refactoring to really understand the issues you're addressing.
Refactoring is an excellent book that has tips for rewriting code so that behavior is preserved but the code is more clear and easier to work with.
Elements of C# Style is a good dead tree C# style guide, and this blog post has a number of links to good online style guides.
Finally, consider using FxCop and StyleCop. These won't help with the questions you asked, but can detect other stylistic issues with your code. Since you've dipped your toe in the water you might as well jump in.
That's a lot, but developing taste, style and clarity is a major difference between good developers and bad ones.
Each class should do one small thing and do it well. Is your class a Form? Then it should not have ANY business logic in it.
Does it represent a single concept, like a user or a state? Then it shouldn't have any drawing, load/save, etc...
Every programmer goes through stages and levels. You're recognizing a problem with your current level and you are ready to approach the next.
From what you said, it sounds like your current level is "Solving a problem", most likely using procedural code, and you need to start to look more at new ways to approach it.
I recommend looking into how to really do OO design. There are many theories that you've probably heard that don't make sense. The reason they don't is that they don't apply to the way you are currently programming.
Lemme find a good post... Look through these to start:
how-do-i-break-my-procedural-coding-habits
are-there-any-rules-for-oop
object-oriented-best-practices-inheritance-v-composition-v-interfaces
There are also posts that will refer you to good OO design books. A "Refactoring" book is probably one of the very best places you could start.
You're at a good point right now, but you wouldn't believe how far you have to go. I hope you're excited about it because some of this stuff in your near future is some of the best programming "Learning Experiences" you'll ever have.
Good luck.
You can look for small things to change and change each slowly over time.
Are all the methods used in that class only? Look for support methods, such as validation, string manipulation, that can be moved out into helper/util classes.
Are you using any #region sections? Logical groupings of related methods in a #region often lend themselves to being split into separate classes.
Is the class a form? Consider using User Controls for form controls or groups of form controls.
Sometimes large classes evolve over time due to lots of developers doing quick fixes / new features without considering the overall design. Revisit some of the design theory links others have provided here and consider on-going support to enforce these such as code reviews and team workshops to review design.
Well, I'm afraid to say that you may have a bigger issue at hand than a slow load time. You're going to hit issues of tightly coupled code and maintainability/readability problems.
There are very good reasons to split class files into smaller files (and equally good reasons to move files to different projects/assemblies).
Think about what the purpose that your class is supposed to achieve. Each file should really only have a single purpose. If it's too generalized in its goal, for example, "Contain Shopping Basket Logic", then you're headed down the wrong path.
Also, as mentioned, the term you use: "Main C# file" just reeks that you have a very procedural mindset. My advise would be to stop, step back, and have a quick read up on some of the following topics:
General OOP principles
Domain-driven design
Unit testing
IoC Containers
Good luck with your searches.
Use Partial classes. You can basically break a single class into multiple files.
Perhaps the OP can respond: is your project using Object-Oriented Programming? The fact that you use the word "file" suggests that it is not.
Until you understand object orientation, there is no hope for improving your code in any important way. You'd do better to not split up the file at all, wait until it grows to be unbearably slow and buggy, then instead of bearing it any more, go learn OO.
The Intellisense parser in Visual Studio 2008 seems to be considerably faster than the one 2005 (I know they specifically did a lot of work in this area), so although you should definitely look into splitting the file up at some point as others have mentioned, Visual Studio 2008 may resolve your immediate performance problem. I've used it to open a 100K+ line Linq to SQL file without much issue.
Split the code so that each class/file/function/etc. does only One Thing™. The Single Responsibility Principle is a good guideline for splitting functionality into classes.
Nowadays, the largest classes that I write are about 200 lines long, and the methods are mostly 1-10 lines long.
If you have regions of code within a class, a simple method is to use the partial keyword and breakout that definition of the class into that file. I typically do this for large classes.
The convention I use is to have the ClassName_RegionName.cs. For example, if I want to break out a class that manages the schema for a database connection, and I called the class DatabaseConnection I would create a file called DatabaseConnection.cs for the main class and then DatabaseConnection_Schema.cs for the schema functionality.
Some classes just have to be large. That's not bad design; they are just implementation-heavy.
I've just coded a 700 line class. Awful. I hang my head in shame. It's as opposite to DRY as a British summer.
It's full of cut and paste with minor tweaks here and there. This makes it's a prime candidate for refactoring. Before I embark on this, I'd thought I'd ask when you have lots of repetition, what are the first refactoring opportunities you look for?
For the record, mine are probably using:
Generic classes and methods
Method overloading/chaining.
What are yours?
I like to start refactoring when I need to, rather than the first opportunity that I get. You might say this is somewhat of an agile approach to refactoring. When do I feel I need to? Usually when I feel that the ugly parts of my codes are starting to spread. I think ugliness is okay as long as they are contained, but the moment when they start having the urge to spread, that's when you need to take care of business.
The techniques you use for refactoring should start with the simplest. I would strongly recommand Martin Fowler's book. Combining common code into functions, removing unneeded variables, and other simple techniques gets you a lot of mileage. For list operations, I prefer using functional programming idioms. That is to say, I use internal iterators, map, filter and reduce(in python speak, there are corresponding things in ruby, lisp and haskell) whenever I can, this makes code a lot shorter and more self-contained.
#region
I made a 1,000 line class only one line with it!
In all seriousness, the best way to avoid repetition is the things covered in your list, as well as fully utilizing polymorphism, examine your class and discover what would best be done in a base class, and how different components of it can be broken away a subclasses.
Sometimes by the time you "complete functionality" using copy and paste code, you've come to a point that it is maimed and mangled enough that any attempt at refactoring will actually take much, much longer than refactoring it at the point where it was obvious.
In my personal experience my favorite "way of removing repetition" has been the "Extract Method" functionality of Resharper (although this is also available in vanilla Visual Studio).
Many times I would see repeated code (some legacy app I'm maintaining) not as whole methods but in chunks within completely separate methods. That gives a perfect opportunity to turn those chunks into methods.
Monster classes also tend to reveal that they contain more than one functionality. That in turn becomes an opportunity to separate each distinct functionality into its own (hopefully smaller) class.
I have to reiterate that doing all of these is not a pleasurable experience at all (for me), so I really would rather do it right while it's a small ball of mud, rather than let the big ball of mud roll and then try to fix that.
First of all, I would recommend refactoring much sooner than when you are done with the first version of the class. Anytime you see duplication, eliminate it ASAP. This may take a little longer initially, but I think the results end up being a lot cleaner, and it helps you rethink your code as you go to ensure you are doing things right.
As for my favorite way of removing duplication.... Closures, especially in my favorite language (Ruby). They tend to be a really concise way of taking 2 pieces of code and merging the similarities. Of course (like any "best practice" or tip), this can not be blindly done... I just find them really fun to use when I can use them.
One of the things I do, is try to make small and simple methods that I can see on a single page in my editor (visual studio).
I've learnt from experience that making code simple makes it easier for the compiler to optimise it. The larger the method, the harder the compiler has to work!
I've also recently seen a problem where large methods have caused a memory leak. Basically I had a loop very much like the following:
while (true)
{
var smallObject = WaitForSomethingToTurnUp();
var largeObject = DoSomethingWithSmallObject();
}
I was finding that my application was keeping a large amount of data in memory because even though 'largeObject' wasn't in scope until smallObject returned something, the garbage collector could still see it.
I easily solved this by moving the 'DoSomethingWithSmallObject()' and other associated code to another method.
Also, if you make small methods, your reuse within a class will become significantly higher. I generally try to make sure that none of my methods look like any others!
Hope this helps.
Nick
"cut and paste with minor tweaks here and there" is the kind of code repetition I usually solve with an entirely non-exotic approach- Take the similar chunk of code, extract it out to a seperate method. The little bit that is different in every instance of that block of code, change that to a parameter.
There's also some easy techniques for removing repetitive-looking if/else if and switch blocks, courtesy of Scott Hanselman:
http://www.hanselman.com/blog/CategoryView.aspx?category=Source+Code&page=2
I might go something like this:
Create custom (private) types for data structures and put all the related logic in there. Dictionary<string, List<int>> etc.
Make inner functions or properties that guarantee behaviour. If you’re continually checking conditions from a publically accessible property then create an private getter method with all of the checking baked in.
Split methods apart that have too much going on. If you can’t put something succinct into the or give it a good name, then start breaking the function apart until the code is (even if these “child” functions aren’t used anywhere else).
If all else fails, slap a [SuppressMessage("Microsoft.Maintainability", "CA1502:AvoidExcessiveComplexity")] on it and comment why.