Large Switch statements: Bad OOP?

Large Switch statements: Bad OOP? - c#

I've always been of the opinion that large switch statements are a symptom of bad OOP design. In the past, I've read articles that discuss this topic and they have provided altnerative OOP based approaches, typically based on polymorphism to instantiate the right object to handle the case.
I'm now in a situation that has a monsterous switch statement based on a stream of data from a TCP socket in which the protocol consists of basically newline terminated command, followed by lines of data, followed by an end marker. The command can be one of 100 different commands, so I'd like to find a way to reduce this monster switch statement to something more manageable.
I've done some googling to find the solutions I recall, but sadly, Google has become a wasteland of irrelevant results for many kinds of queries these days.
Are there any patterns for this sort of problem? Any suggestions on possible implementations?
One thought I had was to use a dictionary lookup, matching the command text to the object type to instantiate. This has the nice advantage of merely creating a new object and inserting a new command/type in the table for any new commands.
However, this also has the problem of type explosion. I now need 100 new classes, plus I have to find a way to interface them cleanly to the data model. Is the "one true switch statement" really the way to go?
I'd appreciate your thoughts, opinions, or comments.

You may get some benefit out of a Command Pattern.
For OOP, you may be able to collapse several similar commands each into a single class, if the behavior variations are small enough, to avoid a complete class explosion (yeah, I can hear the OOP gurus shrieking about that already). However, if the system is already OOP, and each of the 100+ commands is truly unique, then just make them unique classes and take advantage of inheritance to consolidate the common stuff.
If the system is not OOP, then I wouldn't add OOP just for this... you can easily use the Command Pattern with a simple dictionary lookup and function pointers, or even dynamically generated function calls based on the command name, depending on the language. Then you can just group logically associated functions into libraries that represent a collection of similar commands to achieve manageable separation. I don't know if there's a good term for this kind of implementation... I always think of it as a "dispatcher" style, based on the MVC-approach to handling URLs.

I see having two switch statements as a symptom of non-OO design, where the switch-on-enum-type might be replaced with multiple types which provide different implementations of an abstract interface; for example, the following ...
switch (eFoo)
{
case Foo.This:
eatThis();
break;
case Foo.That:
eatThat();
break;
}
switch (eFoo)
{
case Foo.This:
drinkThis();
break;
case Foo.That:
drinkThat();
break;
}
... should perhaps be rewritten to as ...
IAbstract
{
void eat();
void drink();
}
class This : IAbstract
{
void eat() { ... }
void drink() { ... }
}
class That : IAbstract
{
void eat() { ... }
void drink() { ... }
}
However, one switch statement isn't imo such a strong indicator that the switch statement ought to be replaced with something else.

The command can be one of 100 different commands
If you need to do one out of 100 different things, you can't avoid having a 100-way branch. You can encode it in control flow (switch, if-elseif^100) or in data (a 100-element map from string to command/factory/strategy). But it will be there.
You can try to isolate the outcome of the 100-way branch from things that don't need to know that outcome. Maybe just 100 different methods is fine; there's no need to invent objects you don't need if that makes the code unwieldy.

I think this is one of the few cases where large switches are the best answer unless some other solution presents itself.

I see the strategy pattern. If I have 100 different strategies...so be it. The giant switch statement is ugly. Are all the Commands valid classnames? If so, just use the command names as class names and create the strategy object with Activator.CreateInstance.

There are two things that come to mind when talking about a large switch statement:
It violates OCP - you could be continuously maintaining a big function.
You could have bad performance: O(n).
On the other hand a map implementation can conform to OCP and could perform with potentially O(1).

I'd say that the problem is not the big switch statement, but rather the proliferation of code contained in it, and abuse of wrongly scoped variables.
I experienced this in one project myself, when more and more code went into the switch until it became unmaintainable. My solution was to define on parameter class which contained the context for the commands (name, parameters, whatever, collected before the switch), create a method for each case statement, and call that method with the parameter object from the case.
Of course, a fully OOP command dispatcher (based on magic such as reflection or mechanisms like Java Activation) is more beautiful, but sometimes you just want to fix things and get work done ;)

You can use a dictionary (or hash map if you are coding in Java) (it's called table driven development by Steve McConnell).

One way I see you could improve that would make your code driven by the data, so for example for each code you match something that handles it (function, object). You could also use reflection to map strings representing the objects/functions and resolve them at run time, but you may want to make some experiments to assess performance.

The best way to handle this particular problem: serialization and protocols cleanly is to use an IDL and generate the marshaling code with switch statements. Because whatever patterns (prototype factory, command pattern etc.) you try to use otherwise, you'll need to initialize a mapping between a command id/string and class/function pointer, somehow and it 'll runs slower than switch statements, since compiler can use perfect hash lookup for switch statements.

Yes, I think large case statements are a symptom that one's code can be improved... usually by implementing a more object oriented approach. For example, if I find myself evaluating the type of classes in a switch statement, that almost always mean I could probably use Generics to eliminate the switch statement.

You could also take a language approach here and define the commands with associated data in a grammar. You can then use a generator tool to parse the language. I have used Irony for that purpose. Alternatively you can use the Interpreter pattern.
In my opinion the goal is not to build the purest OO model, but to create a flexible, extensible, maintainable and powerful system.

I have recently a similar problem with a huge switch statement and I got rid off the ugly switch by the most simple solution a Lookup table and a function or method returning the value you expect. the command pattern is nice solution but having 100 classes is not nice I think.
so I had something like:
switch(id)
case 1: DoSomething(url_1) break;
case 2: DoSomething(url_2) break;
..
..
case 100 DoSomething(url_100) break;
and I've changed for :
string url = GetUrl(id);
DoSomthing(url);
the GetUrl can go to DB and return the url you are looking for, or could be a dictionary in memory holding the 100 urls.
I hope this could help anyone out there when replacing a huge monstrous switch statements.

Think of how Windows was originally written in the application message pump. It sucked. Applications would run slower with the more menu options you added. As the command searched for ended further and further towards the bottom of the switch statement, there was an increasingly longer wait for response. It's not acceptable to have long switch statements, period. I made an AIX daemon as a POS command handler that could handle 256 unique commands without even knowing what was in the request stream received over TCP/IP. The very first character of the stream was an index into a function array. Any index not used was set to a default message handler; log and say goodbye.

Related

Is it safe to write dBCommand.AddParameter even though I'm not going to use it in a query?

Scenario: my query variable is dynamic, there are 4 possible values for that depending on the report type (_reportType). Meaning there are 4 different queries and some of it doesn't have #STAFF in the where condition, so my question is, is it safe to just leave my
dBCommand.AddParameter("#STAFF", staff)
there or should I include if else condition just to be safe?
Like this
if(_reportType == 1)
{
dBCommand.AddParameter("#STAFF", staff);
}
else if (_reportType == 2)
{
//code
}
else if (_reportType == 3)
{
//code
}
else
{
//Don't add dBCommand.AddParameter("#STAFF", staff);
}
Is it safe just to leave addParameter("#STAFF", staff) even though I'm not going to use it in a query?
Example I'm going to write
dBCommand.Initialize(string.Format(query, "RetailTable"), batch);
dBCommand.AddParameter("#STAFF", staff);
But the query value doesn't have #STAFF in the WHERE condition

It should generally be ok to specify unused parameters, aside from the minor overhead of sending the value to the server. The exception is if you execute DDL queries that have a restriction of being the only statement in the batch (e.g. CREATE VIEW). Those would fail due to the parameter.

There are 2 glaring bad practices in your approach:
1. Generating dynamic query within the code.
This approach has many drawbacks and possible security loopholes. You should almost always avoid doing that.
Please go through the following links to understand this more:
https://codingsight.com/dynamic-sql-vs-stored-procedure/
https://www.ecanarys.com/Blogs/ArticleID/112/SQL-injection-attack-and-prevention-using-stored-procedure
2. Trying to use generic Where Clause that fits all your variations.
This approach is disaster in waiting, regardless of the query being written in your application code OR in a Stored Procedure.
This is an ugly code-smell and a maintenance nightmare.
No developer can ever be 100% sure that there will not be any change required during the lifespan of the application due to a simple fact that the client WILL need enhancements on regular bases.
So, even if this approach may work for you for a small period of time, this will blow back.
Assume, over the period, there are few more filter parameters added due to new requirements. Now, imagine how your code would look like and the possibilities it creates of problems you may get if they are not handled properly. Specially when YOU are not making those changes. Scary, right?
Always write code that will not only be easier to read and understand, but also easy to enhance and maintain, regardless of the person writing the code.
So, IMHO, you should add those if-else conditions OR use switch-case blocks to safeguard yourself and your client. It may look overkill in the start, but will surely payoff in future.
Hope this help!

Maintaining modularity in Main()?

I'm writing the simple card game "War" for homework and now that the game works, I'm trying to make it more modular and organized. Below is a section of Main() containing the bulk of the program. I should mention, the course is being taught in C#, but it is not a C# course. Rather, we're learning basic logic and OOP concepts so I may not be taking advantage of some C# features.
bool sameCard = true;
while (sameCard)
{
sameCard = false;
card1.setVal(random.Next(1,14)); // set card value
val1 = determineFace(card1.getVal()); // assign 'face' cards accordingly
suit = suitArr[random.Next(0,4)]; // choose suit string from array
card1.setSuit(suit); // set card suit
card2.setVal(random.Next(1,14)); // rinse, repeat for card2...
val2 = determineFace(card2.getVal());
suit = suitArr[random.Next(0,4)];
card2.setSuit(suit);
// check if same card is drawn twice:
catchDuplicate(ref card1, ref card2, ref sameCard);
}
Console.WriteLine ("Player: {0} of {1}", val1, card1.getSuit());
Console.WriteLine ("Computer: {0} of {1}", val2, card2.getSuit());
// compare card values, display winner:
determineWinner(card1, card2);
So here are my questions:
Can I use loops in Main() and still consider it modular?
Is the card-drawing process written well/contained properly?
Is it considered bad practice to print messages in a method (i.e.: determineWinner())?
I've only been programming for two semesters and I'd like to form good habits at this stage. Any input/advice would be much appreciated.
Edit:
catchDuplicate() is now a boolean method and the call looks like this:
sameCard = catchDuplicate(card1, card2);
thanks to #Douglas.

Can I use loops in Main() and still consider it modular?
Yes, you can. However, more often than not, Main in OOP-programs contains only a handful of method-calls that initiate the core functionality, which is then stored in other classes.
Is the card-drawing process written well/contained properly?
Partially. If I understand your code correctly (you only show Main), you undertake some actions that, when done in the wrong order or with the wrong values, may not end up well. Think of it this way: if you sell your class library (not the whole product, but only your classes), what would be the clearest way to use your library for an uninitiated user?
I.e., consider a class Deck that contains a deck of cards. On creation it creates all cards and shuffles it. Give it a method Shuffle to shuffle the deck when the user of your class needs to shuffle and add methods like DrawCard for handling dealing cards.
Further: you have methods that are not contained within a class of their own yet have functionality that would be better of in a class. I.e., determineFace is better suited to be a method on class Card (assuming card2 is of type Card).
Is it considered bad practice to print messages in a method (i.e.: determineWinner())?
Yes and no. If you only want messages to be visible during testing, use Debug.WriteLine. In a production build, these will be no-ops. However, when you write messages in a production version, make sure that this is clear from the name of the method. I.e., WriteWinnerToConsole or something.
It's more common to not do this because: what format would you print the information? What text should come with it? How do you handle localization? However, when you write a program, obviously it must contain methods that write stuff to the screen (or form, or web page). These are usually contained in specific classes for that purpose. Here, that could be the class CardGameX for instance.
General thoughts
Think about the principle "one method/function should have only one task and one task only and it should not have side effects (like calculating square and printing, then printing is the side effect).".
The principle for classes is, very high-level: a class contains methods that logically belong together and operate on the same set of properties/fields. An example of the opposite: Shuffle should not be a method in class Card. However, it would belong logically in the class Deck.

If the main problem of your homework is create a modular application, you must encapsulate all logic in specialized classes.
Each class must do only one job.
Function that play with the card must be in a card class.
Function that draw cards, should be another class.
I think it is the goal of your homework, good luck!

Take all advices on "best practices" with a grain of salt. Always think for yourself.
That said:
Can I use loops in Main() and still consider it modular?
The two concepts are independent. If your Main() only does high-level logic (i.e. calls other methods) then it does not matter if it does so in a loop, after all the algorithm requires a loop. (you wouldn't add a loop unnecessarily, no?)
As a rule of thumb, if possible/practical, make your program self-documenting. Make it "readable" so, if a new person (or even you, a few months from now) looks at it they can understand it at any level.
Is the card-drawing process written well/contained properly?
No. First of all, a card should never be selected twice. For a more "modular" approach I would have something like this:
while ( Deck.NumCards >= 2 )
{
Card card1 = Deck.GetACard();
Card card2 = Deck.GetACard();
PrintSomeStuffAboutACard( GetWinner( card1, card2 ) );
}
Is it considered bad practice to print messages in a method (ie: determineWinner())?
Is the purpose of determineWinner to print a message? If the answer is "No" then it is not a matter of "bad practice", you function is plain wrong.
That said, there is such a thing as a "debug" build and a "release" build. To aid you in debugging the application and figuring out what works and what doesn't it is a good idea to add logging messages.
Make sure they are relevant and that they are not executed in the "release" build.

Q: Can I use loops in Main() and still consider it modular?
A: Yes, you can use loops, that doesn't really have an impact on modularity.
Q: Is the card-drawing process written well/contained properly?
A: If you want to be more modular, turn DrawCard into a function/method. Maybe just write DrawCards instead of DrawCard, but then there's an optimization-versus-modularity question there.
Q: Is it considered bad practice to print messages in a method (ie: determineWinner())?
A: I wouldn't say printing messages in a method is bad practice, it just depends on context. Ideally, the game itself doesn't handle anything but game logic. The program can have some kind of game object and it can read state from the game object. This way, you could technically change the game from being text-based to being graphical. I mean, that's ideal for modularity, but it may not be practical given a deadline. You always have to decide when you have to sacrifice a best practice because there isn't enough time. Sadly, this is all too often a common occurrence.
Separate game logic from the presentation of it. With a simple game like this, it's an unnecessary dependency.

Oop data structure advice

I am writing a log file decoder which should be capable of reading many different structures of files. My question is how best to represent this data. I am using C#, but am new to OOP.
An example:
The log files have a range of sensor values. One sensor reading can be called A, another B. Obviously, there are many more than 2 entry types.
In different log files, they could be stored either as ABABABABAB or AAAAABBBBB.
I was thinking of describing this as blocks of entries. So in the first case, a block would be 'AB', with 5 blocks. In the second case, the first block is 'A', read 5 times. This is followed by a block of 'B', read 5 times.
This is quite a simplification (there are actually 40 different types of log file, each with up to 40 sensor values in a block). No log has more than 300 blocks.
At the moment, I store all of this in a datatable. I have a column for each entry, with a property of how many to read. If this is set to -1, it continues to the next column in the block. If not, it will assume that it has reached the end of the block.
This all seems quite clumsy. Can anyone suggest a better way of doing this?

I think you should first start here, and then here to learn a little bit about what object oriented programming is. Don't worry about your current problem while learning about OOP.
As you are learning about OO concepts, you should begin to understand code is not data, and data is not code. It does not matter how you represent your data from an OOP stance. You can write OO code to consume your data, or you could write procedurage code to consume your data, that part is irrelevant to the format of the data.
So then getting back to your question
My question is how best to represent this data
It depends on your needs. What is writing the log file? Do you have control over the writer and reader? If I did I would rely on build the built in serialization methods to minize the amount of code I need to write. Is the log file going to be really long? If so the "datatable" approach you described is usually better. If the log file isn't going to be a huge in file size, XML is really easy to work with.

Very basic and straightforward:
Define an interface for IEnrty with properties like string EntryBlock, int Count
Define a class which represents an Entry and implements IEntry
Code which doing a binary serialization should be aware of interfaces, for instance it should reffer IEnumerable<IEntry>
Class Entry could override ToString() to return something like [ABAB-2], surely if this is would be helpful whilst serialization
Interface IEntry could provide method void CreateFromRawString(string rawDataFromLog) if it would be helpful, decide yourself
If you want more info please share code you are using for serialization/deserializaton

In addition to what Bob has offered, I highly recommend Head First Design Patterns as a gentle, but robust introduction to OO for a C# programmer. The samples are in Java, which translate easily to C#.

As for OOP, you want to learn SOLID.
I would suggest you build this using Test Driven Development.
Start small, with a simple fragment of your log data and write a test like (you'll find a better way to do this with experience and apply it to your situation):
[Test]
public void ReadSequence_FiveA_ReturnsProperList()
{
// Arrange
string sequenceStub = "AAAAA";
// Act
MyFileDecoder decoder = new MyFileDecoder();
List<string> results = decoder.ReadSequence(sequenceStub);
// Assert
Assert.AreEqual(5, results.Count);
Assert.AreEqual("A", results[0]);
}
That test code snippet is just a starting point, and I've tried to be rather verbose in the assertions. You can come up with more creative ways over time. The point is to start small. Once this test passes, add another test where you mix "AB" and change your decoder to handle this properly. Eventually, you'll have a large set of tests that handle your different formats. Using TDD, you'll be on the path to using SOLID properly. Whenever you find something you can't test, you should review the rules and see if you can't make it simpler and inject dependencies.
Eventually you'll get into mocking. For example, you might find that you'd rather INJECT the ability for your MyFileDecoder class to have a dependency that will read your log file. In that case, you would create a mock object and pass that into the constructor and set the mock to return the sequenceStub when a method is called.

Declaring and creating an object then adding to collection VS Adding object to collection using new keyword to create object

Ok so the title may have been confusing so i have posted 2 code snippets to illustrate what i mean.
NOTE: allUsers is just a collection.
RegularUser regUser = new RegularUser(userName, password, name, emailAddress);
allUsers.Add(regUser);
VS
allUsers.Add(new RegularUser(userName, password, name, emailAddress));
Which snippet A or B is better and why?
What are the advantages or disadvantages?
The example i wrote was C# but does the language (C#, Java etc) make a difference?

As far as C# is concerned, both of your code examples are practically identical at the IL level. The second examples still creates a reference to the created object and pushes it onto the stack, you just don't have a local variable hooked up to it. This will not create any performance problems at all.

1) Which snippet A or B is better and why?
They're really identical. The compiled code will be nearly identical, since a temporary object is pushed onto the stack, then used in the method call.
2) What are the advantages or disadvantages?
The main advantages and disadvantages to the approach are really just readability.
Your first example has the advantage of keeping a single "operation" per line of code, which, in many ways, is more maintainable.
The second example removes the unnecessary variable declaration, which may be more maintainable.
Personally, I feel that the number of parameters in your RegularUser constructor would probably push me, in this instance, towards your first option. I typically find that, when a line of code gets to be more than about half a screen width on an average monitor, it's easier to read and understand if it's split up. Splitting this up by introducing the temporary and calling Add separately makes this more clear.
However, if you're just adding an integer or a class that's very small, I'd probably vote to skip the unnecessary variable. This is completely a personal preference, however - your milage may (and probably will) vary.
3) The example i wrote was C# but does the language (C#, Java etc) make a difference?
No, for the most part. This is really language/implementation dependent, but most languages will have the same basic behavior and performance in both cases. It is possible (and highly likely) that some languages may treat this differently, but most mainstream languages will not.

I really like to create them the first way unless I really really know what is going on. It is much harder to do debugging if you don't create the object first...
The compiler will just turn the 2nd version into the 1st for you, anyway, so there isn't a net negative effect.
Pros of #1:
easier to debug (!)
theoretically easier to read, clearer
can use the object later
Cons:
more verbose
can be unnecessary, especially for trivial objects
Result:
1 for anything complex to create, or that may need to be inspected easily at debug time
2 for lots of annoying little stuff, like the following.
var list = new List<NameValuePair>(3);
list.Add( new NameValuePair("name", "valuable");
list.add( new NameValuePair("age", "valuable");
list.add( new NameValuePair("height", "not valuable");
var dates = new List<date>();
dates.Add(DateTime.Now());
dates.Add(DateTime.Now().Date().AddYears(-2));
As far as I know there isn't a real difference between languages when it comes to this. Some may not allow it, though.

Both are equal in terms of performance.
In terms of maintainability the second case is a nightmare, it is (nearly) impossible to trace in a debugger. So I tend to prefer the first one. In my early oop days I was always writing the second, because "I knew that they were objects and I was sooo good at grasping objects that I ... blah blah blah", but that wore off with time and especially maintenance time
Also, suppose that someone wants you to
FilterClass.FilterUser(regUser)
or
Database.AddToDatabase(regUser)
because it is the right place to do so, the first scenario is better.
Finally, when do you stop?
allUsers.Add(new RegularUser(new ReadFromInput(new EscapedName(new Name(new String(userName)))), password, name, emailAddress));

Would you use regions within long switch/enum declarations?

I've recently found myself needing (yes, needing) to define absurdly long switch statements and enum declarations in C# code, but I'm wondering what people feel is the best way to split them into logical subsections. In my situation, both the enum values and the cases (which are based on the enum values) have fairly clear groupings, yet I am slightly unsure how to reflect this in code.
Note that in my code, I have roughly 5 groups of between 10 and 30 enum values/cases each.
The three vaguely sensible options I can envisage are:
Define #region blocks around all logical groups of cases/enum values within the declaration (optionally separated by blank lines).
Comment each group with it's name, with a blank line before each group name comment.
Do nothing whatsoever - simply leave the switch/enum as a huge list of cases/values.
Which do you prefer? Would you treat enums and switches separately? (This would seem slightly odd to me.) Now, I wouldn't say that there is any right/wrong answer to this question, though I would nonetheless be quite interested in hearing what the general consenus of views is.
Note 1: This situation where I might potentially have an extremely long enum declaration of 50/100+ values is unfortunately unavoidable (and similarly with the switch), since I am attempting to write a lexer (tokeniser), and this would thus seem the most reasonable approach for several reasons.
Note 2: I am fully aware that several duplicate questions already exist on the question of whether to use regions in general code (for structuring classes, mainly), but I feel my question here is much more specific and hasn't yet been addressed.

Sure, region those things up. They probably don't change much, and when they do, you can expand the region, make your changes, collapse it, and move on to the rest of the file.
They are there for a reason, use them to your advantage.

You could also have a Dictionary<[your_enum_type], Action> (or Func instead of Action) or something like that (considering your functions have a similar signature). Then you could instead of using a switch, instead of:
switch (item)
{
case Enum1: func1(par1, par2)
break;
case Enum2: func2(par1, par2)
break;
}
you could have something like:
public class MyClass
{
Dictionary<int, Action<int, int>> myDictionary;
//These could have only static methods also
Group1Object myObject1;
Group2Object myObject2;
public MyClass()
{
//Again, you wouldn't have to initialize if the functions in them were static
myObject1 = new Group1Object();
myObject2 = new Group2Object();
BuildMyDictionary();
}
private Dictionary<int, Action<int, int>> BuildMyDictionary()
{
InsertGroup1Functions();
InsertGroup2Functions();
//...
}
private void InsertGroup2Functions()
{
myDictionary.Add(1, group2.AnAction2);
myDictionary.Add(2, group2.AnotherAction2);
}
private void InsertGroup1Functions()
{
myDictionary.Add(3, group1.AnAction1);
myDictionary.Add(4, group1.AnotherAction1);
}
public void DoStuff()
{
int t = 3; //Get it from wherever
//instead of switch
myDictionary[t](arg1, arg2);
}
}

I would leave it as a huge list of cases/ values.

If there are some cases that have the same code block, using the Strategy design pattern, could remove the switch block. This can create a lot of classes to you, but will show how complex it really is, and split the logic in smaller classes.

Get rid of the enums and make them into objects. You could then call methods on your objects and keep the code separated, maintainable, and not a nightmare.
There are very few cases when you would actually need to use an enum instead of an object and nobody likes long switch statements.

Here's a good shortcut for people who use regions.
I was switching between Eclipse and Visual Studio when I tried to go full screen in VS by pressing
Ctrl-M-M
and lo and behold, the region closed and expanded!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.