How much info should I put into a class? (OOP) - c#

I'm a 1st level C# programming student, though I've been dabbling in programming for a few years, and learning above and beyond what the class teaches me is just what I'm doing so that I'm thoroughly prepared once I get out into the job environment. This particular class isn't OOP at all, that's actually the next class, but for this project the teacher said he wouldn't mind if we went above and beyond and did the project in OOP (in fact you can't get an A in his class unless you go above and beyond anyways).
The project is(at this point) to read in an XML file, byte by byte, store element tags to one array, and the data values to another. I fought with him on this(given the .net frameworks dealing on XML) but that was a losing battle. He wants us to code this without using .net XML stuff.
He did provide an example of OOP for this program that he slopped together (originally written in Java, ported to C++, then ported from C++ to C#)
In his example he's got three classes. the first, XMLObject, which contains the arrays, a quasi constructor, getter and setter methods(not properties, which I plan to fix in my version), and a method for adding the < and > to tags to be stored in the arrays (and output to console if need be.)
The second class is a parseXML class. In this one he has fields that keep track of the line count, file offset, tag offset, and strings to hold elements and data.
Again, he's got getter and setter methods, several parse methods that search for different things, and a general parse method that uses the other parse methods(sort of combines them here). Some of these methods make calls to the XMLObject class's methods, and send the parsed element and data values to their respective arrays.
The third class he has is one that has no fields, and has two methods, one for doing ATOI and one for dumping a portion of the file stream to the console.
I know we're essentially building a less efficient version of what's already included in the .net framework. I've pointed this out to him and was told "do not use .net's XML class, end of discussion" so let's all agree to just leave that one alone.
My question is, should those really be 3 separate classes. Shouldn't the parsing class either inherit from the XML object class, or just be coded in the XML object class, and shouldn't the ATOI and dumping methods be in one of those two classes as well?
It makes sense to me that if the parsing class's aim in life is to parse an XML file and store elements and data fields to an array, it should be in the same class rather than being isolated and having to do it through getters and setters(or properties in the version I'm going to do). I don't see why the arrays would need to be encapsulated away from the parse methods that actually give them what to store.
Any help would be appreciated, as I'm still designing this, and want to do it at least as close to "proper"(I know it's a relative term) OOP form.

The general rule is that we count the size of a class in the number of responsibilities that it has:
A class should have a single
responsibility: a single reason to
change.
It seems to me that your teacher did separate his responsibilities correctly. He separated the presentation from the xml parsing logic, and he separated the xml data from the xml parsing behavior.

First: If you're in a programming class, there may be a good reason he wants you to do this by hand: I really don't recommend arguing with your professors. You'll never win, and you can hurt your grades.
Second: His version is not (considering the fact that it is largely a re-writing of parts of the System.XML namespace) too terrible. Basically you have one class that "Is" your XML. Think of it like the XDocument or XmlDocument classes: Basically it just contains the Xml itself. Then You have your Xml Parser: think of that like XmlReader. And your last one is sort of his equivalent of XmlWriter.
Remember that with OOP, your Xml class (the one that represents the document itself) should neither know nor care how it came into possession of the information it has. Further, the Parser should know how to get the Xml, but it shouldn't much care where it gets stored. Finally, your Writer class shouldn't really care where the data is coming from, only where it's going.
I know it's over-used, but think of your program like a car- it has several parts that all have to work together, but you should be able to change any given part of it without majorly affecting the other pieces. If you lump everything in one class, you lose that flexibility.

Some points:
Classes are nouns; methods are verbs.
Your class should be called XmlParser.
Since the XML parser is neither part of the XMLObject nor extends the XMLObject, it should be a separate class.
The third class has nothing to do with either of the other two; it's just an ordinary Utilities class.
In general, each class should be responsible for a single unit of work or storage.
Don't try to put too much into a single class (see the "God object" anti-pattern).
There's nothing wrong with having lots of classes. (As long as they all make sense)

Let's summarize what the system must do :
to read in an xml file, byte by byte,
store element tags to one array,
the data values to another.
I would probably slice it up in the following way:
Reader : Given a file path, yields the contents byte-wise (IEnumerable<byte>)
Tokenizer: Given an enumeration of bytes, yields tokens relevant to the XML-Context (IEnumerable<XmlToken>)
XmlToken : Base class to any output that the tokenizer produces. For now you need 2 specializations :
Tag : An opening tag
Value : Contents of a tag
TokenDelegator : Accepts a Tokenizer and an instance of
IXmlTokenVisitor: (See Visitor pattern)
TagAndValueStore: Implements IXmlTokenVisitor. Visit(Tag tag) and Visit(Value value) are implented and the relevant content stored in arrays.
You see, I ended up with 7 classes and 1 interface. But you may notice that you have laid the foundations for a fully-fledged XML parser.
Often code that is sold to be OO just plain isn't. A class should adhere to the Single-Responsibility principle.

Related

Standard C# class/approach for representing a cash-flow "umbrella" value?

I am creating a program to model my financial ingoings and outgoings
For example, one value is "outgoings", made up of "rent" and "livingCosts",
and then "livingCosts" is made up of "food", "entertainment" and "houseBills" etc.
I want to define this "umbrella-term" relationship between the numeric values. I could create my own class, but I suspected there might already be a class / special approach in C# to do this, as it seems like a common problem. Is there?
There is no such class for two reasons: it's trivial to create and everyone needs it to be a bit different. So actually making a class for it that then has to be customized, where the customization takes more work and code than the actual class is not very efficient.
You will have to roll your own.

What is Reflection property of a programming language?

Its said that most high-level dynamically types languages are reflexive. Reflection (computer programming) on Wikipedia explains but it doesn't really give a very clear picture of what it means. Can anyone explain it in a simpler way by a relevant example?
To give you a example how to use Reflection in a practical way:
Let's assume you are developing an Application which you'd like to extend using plugins. These plugins are simple Assemblies containing just a class named Person:
namespace MyObjects
{
public class Person
{
public Person() { ... Logic setting pre and postname ... }
private string _prename;
private string _postname;
public string GetName() { ... concat variabes and return ... }
}
}
Well, plugins should extend your application at runtime. That means, that the content and logic should be loaded from another assembly when your application already runs. This means that these resources are not compiled into your Assembly, i.e. MyApplication.exe. Lets assume they are located in a library: MyObjects.Person.dll.
You are now faced with the fact that you'll need to extract this Information and for example access the GetName() function from MyObjects.Person.
// Create an assembly object to load our classes
Assembly testAssembly = Assembly.LoadFile(Application.StartUpPath + #"MyObjects.Person.dll");
Type objType = testAssembly.GetType("MyObjects.Person");
// Create an instace of MyObjects.Person
var instance = Activator.CreateInstance(objType);
// Call the method
string fullname = (string)calcType.InvokeMember("GetName",
BindingFlags.InvokeMethod | BindingFlags.Instance | BindingFlags.Public,
null, instance, null);
As you can see, you could use System.Reflection for dynamic load of Resources on Runtime. This might be a help understanding the ways you can use it.
Have a look on this page to see examples how to access assemblys in more detail. It's basically the same content i wrote.
To better understand reflection, think of an interpreter that evaluates a program. The interpreter is a program that evaluates other programs.
The program can (1) inspect and (2) modify its (a) own state/behavior, or the state/behavior of the interperter running it (b).
There are then four combinations. Here is an example of each kind of action:
1a -- Read the list of fields an object has
2a -- Modification of the value of one field based on the name of the field; reflective invocation of methods.
1b -- Inspect the current stack to know what is the current method that is executed
2b -- Modify the stack or how certain operations in the language are executed (e.g. message send).
Type a is called structural reflection. Type b is called behavioral reflection. Reflection of type a is fairly easy to achieve in a language. Reflection of type b is way more complicated, especially 2b--this is an open research topic. What most people understand by reflection is 1a and 2a.
It is important to understand the concept of reification to understand reflection. When a statement in the program that is interpreted is evaluated, the interpreter needs to represent it. The intepreter has probably objects to model field, methods, etc. of the program to be interpreted. After all, the interpreter is a program as well. With reflection, the interpreted program can obtain references to objects in the interpreter that represent its own structure. This is reification. (The next step would be to understand causal connection)
There are various kinds of reflective features and it's sometimes confusing to understand what's reflective or not, and what it means. Thinking in term of program and interpreter. I hope it will help you understand the wikipedia page (which could be improved).
Reflection is the ability to query the metadata the program that you wrote in run-time, For example : What classes are found inside an assembly, What methods, fields and properties those classes contains, and more.
.net contains even 'attributes', those are classes that you can decorate with them classes, methods, fields and more, And all their purpose is to add customized metadata that you can query in run-time.
Many time details depend on metadata only. At the time of validation we don't care about string or int but we care that it should not be null. So, in that case you need a property or attribute to check without caring about specific class. There reflection comes in picture. And same way if you like to generate methods on a fly, (as available in dynamic object of C# 4.0), than also it is possible using reflection. Basically it help to do behavior driven or aspect oriented programming.
Another popular use is Testing framework. They use reflection to find methods to test and run it in proxy environment.
It is the ability of a programming langauge to adapt it's behaviour based upon runtime information.
In the .Net/C# world this is used frequently.
For example when serializing data to xml an attribute can be added to specify the name of the field in the resultant xml.
This is probably a better question for programmers.stackexchange.com.
But it basically just means that you can look at your code from within your code.
Back in my VB6 days there were some UI objects that had a Text property and others that had a Description (or something other than 'Text' anyway, I forget). It was a pain because I couldn't encapsulate code to deal with both kinds of objects the same way. With reflection I would have at least been able to look and see whether an object had a Text or a Description property.
Or sometimes objects might both have a Text property, but they derive from different base classes and don't have any interface applied to them. Again, it's hard to encapsulate code like this in a statically typed language without the help of reflection, but with reflection even a statically typed language can deal with it.

How do you arrange source code elements in C# 3.0?

Does following look good?
Edit: Options are general in nature, may not be exhaustive in terms of C# elements.
Single Source file can contain following:
Notes:
Files can come in pair - Editable + Generated
Single file can have only one name-space.
File: Option-1
One partial or full class per file
Zero or more enum per file
Zero or more structures per file
Zero or more delegate type per file
File: Option-2
One or more interfaces per file
File: Option-3
One static class per file
Within class: Option-1
There will be following sections in given Order.
Enums - Fields - Properties - Events - Delegates - Methods
Within each section, elements will be ordered by accessibility i.e. public methods will appear before private methods. Inner types can have their own section between any two sections. Optionally, related fields and properties can be grouped together.
Within class: Option-2
Group closely related elements without looking at accessibility level. Use regions without fail.
Within class: Option-3
Just do not care. Let VS help you.
What do you guys think and do?
I only have a single element per file. If you need to group things together to tidy them up, then that is what namespaces are for.
I also tend to stick fields and properties at the top of classes, followed by the constructors, then methods. I usually keep private methods next to the public ones that use them.
Edit: And under no circumstances should you use regions! ever. at all. If youre class is so big you need to collapse huge portions of it youve got far worse problems to worry about.
I generally put types in their seperate files. (Enums, structs, classes and delegates) Nested types go in the same file as their parenting type.
Partial files are only used with generated files.
Within a file, the main structure is:
Nested classes
Consts, fields, event and delegate fields
Properties
Ctors
Finalizer
Methods (related ones are close to eachother, not necessarily grouped by accessibillity.)
I'm not too strict on these rules. They're guidelines...
I use File: Option-3 and a mix of "Within class: Option-1" + "Within class: Option-2" depends on the class type. If there is clear relationship then I'll go for option-2 but most of the time I stick with Option-1.
I usually use Option-3 for files and Option-1 within classes. Classes are structured by this regions:
Nested Classes
Constants
Events / Delegates
Fields
Construction / Destruction / Finalization
Properties
Methods
I also would put only one element per file. It's easier to find the elements if they are in their own file, especially in large projects.
Robert C Martin's book Clean Code provides some useful guidance on this- although the contents are for Java, I found the guidance still very applicable to .NET.
The most important thing is to pick a style and stick with. StyleCop is very useful for enforcing these rules.

Should I separate Dispose logic into a partial class file?

While refactoring some C# classes, I've run into classes that implement IDisposable.
Without thinking, I have created partial class files for each class that implements IDisposable interface.
E.g.) For Stamper.cs -> Stamper.cs + Stamper.Dispose.cs
where Stamper.cs contains actual logic for stamping
and Stamper.Dispose.cs that contains dispose logic
// Stamper.cs
public partial class Stamper
{
// actual logic
}
// Stamper.Dispose.cs
public partial class Stamper: IDisposable
{
// Implement IDisposable
}
When I looked at the code, Stamper.cs now looks a lot cleaner and readable (now about 52 lines instead of 100 lines where around 50 lines was simply a clean-up dispose code)
Am I going too far with this?
*EDIT: Thanks folks for your opinions - I have decided to put two files together into one.
The Problem I had faced was that I was actually forgetting to update IDisposable implementation after updating actual logic.
Moreover there wasn't much problem navigating between method in the source code.
The first reason seems more than a reason enough to stick with one file solution in my specific case.
Yes, too far. Whats wrong with just sticking a #Region around the code and folding it so you cant see it?
It seems about as arbitrary as creating a partial class for constructor logic. Now I have to look at two files to grock that class. Partial classes are only really worth it for designer stuff...
I would prefer to see the dispose logic in the same file as the resources that warrant implementing IDisposable. Whilst there's an element of subjectivity, I'd say it's too far
I think your solution is unsound. Partial classes should usually only be used to separate developer code from generator code. Regions usually do a better job of adding structure to your code.
If your clean up procedure is heavy, it's acceptable, but not ideal.
It might be a good habit for boiler plate such as exposed events, heavy serialization methods and in your case memory management.
I prefer partial classes than outlines (#region). If you have to use partial class or code outlining to make your code readable, this is usually a sign that the code needs to be changed. Tear the class appart and only as a last resort use partial class (or region) if the code is absolutly necessary for upkeeping that class.
In your case, you could use a class that thinly wraps the unmanaged resource and expose a single Dispose. Then in your other class, use the managed object and Dispose it with no logic.
If your class is only a thin wrap, then I'd say your method is overkill since the whole point of the class is to dispose an unmanaged resource.
Kind of an odd question since it only has an impact on the developer, making it completely up to personal preference. I can only tell you what I would prefer and that would be that I would do it if a significant amount of logic was in the dispose portion.
Personally I try to keep my instantiation/initialization logic and my cleanup/disposal logic side-by-side, it's a good reminder.
As for partial classes, the only time I use them is if a class is very large and can be categorized into groups of methods. Hiding designer code is great too.
I'd favor using a partial class when, and only when, the code in question was computer-generated. If you have many classes which share similar code (which for various reasons has to be repeated, rather than being pulled out into its own class) it may be useful to have a few templates and a program to generate the code based upon such templates. Under that scenario, the templates would be regarded as source files and then generated files as intermediate object-ish code. It would seem entirely appropriate to pull out the template-generated code into partial classes.
In vb.net, such an approach might be nice to allow field declaration, initialization, and cleanup to be safely handled together within an IDisposable object. A moderate amount of boilerplate code is required, but field declarations after that are pretty clean. For example:
' Assuming Option Implicit on:
Dim MyThingie = RegDisposable(New DisposableThingie)
' If Implicit wasn't on:
Dim MyThingie As DisposableThingie = RegDisposable(New DisposableThingie)
RegDisposable would be a class member that would add the new DisposableThingie to a list held by the class. The class' Dispose routine would then Dispose all the items in the list.
Unfortunately, there's no clean way to do anything similar in C#, since field initializers cannot make use of the object about to be constructed (in vb.net, field initializers run after the base object is constructed).

Regarding Passing Many Parameters

I have around 8-9 parameters to pass in a function which returns an array. I would like to know that its better to pass those parameters directly in the function or pass an array instead? Which will be a better way and why?
If I would do anything, then it would be to create an structure that holds all parameters to get nice intellisence and strong names.
public struct user
{
public string FirstName;
public string LastName;
public string zilionotherproperties;
public bool SearchByLastNameOnly;
}
public user[] GetUserData(user usr)
{
//search for users using passed data and return an array of users.
}
Pass them individually, because:
that is the type-safe way.
IntelliSense will pick it up in Visual Studio and when you write your calling functions, you will know what's what.
It is faster to execute that way.
If the parameter really IS the array, though, then pass the array. Example:
For functions which look like this, use this notation:
Array FireEmployee(string first, string middle, string last, int id) {...}
For functions that look like this, use the array:
Array FireEmployees(Employee[] unionWorkers) {...}
Your scenario is covered by the Introduce Parameter Object refactoring in Martin Fowler's refactoring book. The book is well worth owning, but for those who don't, the refactoring is described here. There's also a preview on the publisher's site, and on Google books. It recommends replacing the parameters not with an array, but a new object.
Regarding Skeets comment on my example above that he would use a class instead of a structure and maybe make it clearer where to use a class and where to use a structure i post this too. I think there are other out there who are curious about this too.
The main reason to use a class as I could see was you could make it immutable, but thats possible with structures too?
for example:
struct user
{
public user(string Username, string LastName)
{
_username = Username;
}
private string _username;
public string UserName {
get { return _username; }
}
}
I have long time felt that I dont know the differences anymore between classes and structures now when we can have propertys, initializers, fields and exactly everything that a class has in a structure too. I know classes are refernce types and structures are value types but what difference does it make in the case above when using it as a parameter in a function?
I found this description of the differences on the site http://www.startvbdotnet.com/oop/structure.aspx and that description is exactly how I mapped it in my head:
Structures can be defined as a tool
for handling a group of logically
related data items. They are
user-defined and provide a method for
packing together data of different
types. Structures are very similar to
Classes. Like Classes, they too can
contain members such as fields and
methods. The main difference between
classes and structures is, classes are
reference types and structures are
value types. In practical terms,
structures are used for smaller
lightweight objects that do not
persist for long and classes are used
for larger objects that are expected
to exist in memory for long periods.
Maybe this should be a own question but I felt it was related when we all had different views on the structure vs class-thing as parameter.
I assume you're using C# 4 and can just use named parameters:
FireEmployee(
first: "Frank",
middle: "",
last: "Krueger",
id: 338);
These make the code almost as readable as VB or Smalltalk. :-)
If not, I would go with what Dave Markle has to say.
If this is library code that will see a lot of use, and if some of the parameters have typical values that are candidates for default values, then you should consider Dave Markle's advice, and provide a selectio of overloads with progressively fewer parameters. This is the approach recommended in the Microsoft Framework Design Guidelines.
Alternately, you can get a similar effect with Stefan's approach, by setting default values with member initializers and using a progression of ctor overloads.
If you really don't want to pass in your arguments separately I would suggest creating a new class which encapsulates all of your arguments. You can (in Java and most likely in C#) declare a public inner class inside the class containing the gnarly method for this purpose. This avoids having classes floating around which are really just helper types.
I would say pass them individually as well. I don't like the idea of creating a class, then passing that class through as an argument. Its a form of stamp coupling, which means making changes will be harder since one class uses the other. And reusing one class means you would have to reuse the other as well.
You could use an interface to reduce stamp coupling, but that's too much overhead for my tastes, so that's why I like to pass the arguments individually.
Do you really need 8-9 parameters for a single function? It seems to me that if you need that many parameters, then you're probably doing too many different things in that function. Try refactoring the code into separate functions so that each function has exactly one purpose.
Do not pass them as an array unless the function acts on an array, I wouldn't create a new data structure either to group the parameters for the following reasones
Passing a new data structure hides what the function really needs as input (does it need all the data structure/part of it?)
Related to 1 it makes UTs more difficult (when writing a UT you need to recreate the entire data structure)
If the input parameters are not related you end up with a new data structure that groups unrelated data types for no other reason than to make a function call look neater
If you chose to pass the new data structure to your function the function can not be used in a scope where the new datastructure was defined
Really the only disadvantage to passing each paramater to the function is that you might not be able to fit the function in one line of code, but don't forget the lines you need before the function call in which you will fill up your data structure.

Categories