Efficient usage of TextWriter - c#

Is there any alternative way to perform the operation:
textWriter.Write(myBigObject.ToString())
such that:
myBigObject is 'streamed' into the text representation without creating the whole string object in memory
there are no additional classes or objects used, beside myBigObject and textWriter
Example: Imagine that myBigObject has 50 string fields. There is no point in joining all these fields in a big string and then writing the object to a file, if it is somehow possible to write the strings one by one to the file.

If you have access to the code, you can add a method to MyBigObject that takes a TextWriter and writes out each property. For example:
public class MyBigObject
{
public void Write(TextWriter writer)
{
writer.Write(bigStringField1);
writer.Write(bigStringField2);
// etc.
}
}
If sub-classes of MyBigObject need to write their own representation, then make the method virtual, and the sub-classes call the implementation in the base class.
If you don't own the code, and the fields are exposed through properties, you could build an adapter class that takes a MyBigObject and writes out each property. You could also build some extension methods that do the same thing.
If you cannot access the source code, you could use reflection to do examine the fields on the object, grab the value of each field, and Write() out each value's ToString() representation. However, reflection is slower than direct field access, and it involves a lot more intermediate objects. I don't know if using reflection would be worth it in your case.

Given the limitations you have outlined this isn't possible. You would have to come up with a way to read the data from your object and write it out on char/byte/line at a time.
If you want to be able to loop over your properties and write them out one at a time then this would be possible using reflection. However I suspect going this route would result in using more memory than your original solution as well as being much more complicated than a simple call to .ToString().

Related

Should a constructor parse input?

Often, I find that I must instantiate a bunch of objects, but I find it easier to supply the parameters for this instantiation as a human-readable text file, which I manually compose and feed into the program as input.
For instance, if the object is a Car then the file might be a bunch of rows, each containing the name, speed and color (the three mandatory constructor parameters) delimited with tabs:
My car 65 Red
Arthur's car 132 Pink
Old junk car 23 Rust brown
This is easy for me to inspect visually, modify or generate by another program. The program can then load the file, take each row, parse out the relevant parameters, feed them into a Car(string name, int speed, uint color) constructor and create the object.
Notice how there is some work that must be done on the input before it is compatible with the constructor: The speed must be converted from string to int with a call to int.Parse. The color must be matched to a RGB value by looking up the English color name (perhaps the program would access Wikipedia to figure out each color's value, or consults a predefined map of name -> RGB somewhere).
My question is, from an OOP standpoint, who should do this parsing? The constructor, or the method calling the constructor?
With the first option, the advantage is simplicity. The calling function must only do:
foreach(var row in input_file)
list_of_objects_that_i_am_populating.Add(new Car(row));
And all the ugly parsing can be nicely contained in the constructor, which doesn't have much other code anyhow, so the parsing code can be easily read and modified without being distracted by non-parsing code.
The disadvantage is that code reuse goes out the window because now my object is joined at the hip to an input format (worse, because the input format is ad-hoc and manually composed, it is ephemeral and potentially not guaranteed to stay the same). If I reuse this object in another program, where I decide that it is convenient to slightly change the formatting of the input file, the two versions of the object definition are now divergent. I often find myself defining input formats in the comment section of the constructor, which seems a bit code-smelly.
Another disadvantage is that I have lost the ability to do batch operations. Recall the earlier example problem of mapping color names to values: What if I was using a web service that takes 1 minute to process every individual request, regardless of whether that request is asking to convert one color name or a million. With a very large input file, I would drastically slow down my application by accessing the service once for each row, instead of submitting one big request for all rows, and then instantiating the objects according to the reply.
What is the "correct" way to handle a situation like this? Should I parse input the constructor and treat the problems above as exceptional issues that must be dealt with on a case-by-case basis? Should I let my calling method do the parsing (even though it may already be bloated with much convoluted program logic)?
My question is, from an OOP standpoint, who should do this parsing? The constructor, or the method calling the constructor?
In general, you should avoid doing this within the constructor. That would be a violation of the Single Responsibility Principle. Each type should only be responsible for the operations required within that type, and nothing else.
Ideally, a separate class would be responsible for parsing the data into its proper form (and nothing else). The method creating your instance would take that (parsed) data and create your types.
I would create and use factory methods to load via a settings file, or csv. I would NOT put such code in the constructor itself.
Factory version 1:
public class Car
{
... your existing methods and data ...
public static Car CreateFromCsv(string csv ) { .... }
public static Car CreateFromFile(string fileName) { ...}
}
Or use a dedicated Factory:
public static class CarFactory
{
public static Car CreateFromCsv(string csv ) { .... }
public static Car CreateFromFile(string fileName) { ...}
}
Or a dedicated business logic class:
namespace BusinessLogic;
public class LoadCars
{
public Car ExecuteForCsv(string csv) { ...}
public Car ExecuteForFile(string fileName) { ... }
}
I think it's generally better practice to make your FileParser separate from your Car class. I would personally parse the file and return a List<string[]> or something to that effect then make an overload of the Car constructor, like this:
Car(string[] values)
{
// do error handling here like
if (values.Length != 2)
// error
if (int.TryParse(values[1], out tempVar))
// set int param, if not then throw error
}
So I would have one class that parses the file into its tokens (as strings) and does basic error handling (like checking the file exists and that the record count is what you'd expect etc.). Then do more specific input validation in the car constructor since that will apply to other input sources as well (say the user enters their input at the cmd line, you could still use that constructor effectively).
In general, avoid putting code in constructors which may throw an exception or simply fail to construct a properly formed object. And as you note in your question, your current implementation has tightly coupled your objects to a file format which is usually better delegated to a class or factory method.

How much info should I put into a class? (OOP)

I'm a 1st level C# programming student, though I've been dabbling in programming for a few years, and learning above and beyond what the class teaches me is just what I'm doing so that I'm thoroughly prepared once I get out into the job environment. This particular class isn't OOP at all, that's actually the next class, but for this project the teacher said he wouldn't mind if we went above and beyond and did the project in OOP (in fact you can't get an A in his class unless you go above and beyond anyways).
The project is(at this point) to read in an XML file, byte by byte, store element tags to one array, and the data values to another. I fought with him on this(given the .net frameworks dealing on XML) but that was a losing battle. He wants us to code this without using .net XML stuff.
He did provide an example of OOP for this program that he slopped together (originally written in Java, ported to C++, then ported from C++ to C#)
In his example he's got three classes. the first, XMLObject, which contains the arrays, a quasi constructor, getter and setter methods(not properties, which I plan to fix in my version), and a method for adding the < and > to tags to be stored in the arrays (and output to console if need be.)
The second class is a parseXML class. In this one he has fields that keep track of the line count, file offset, tag offset, and strings to hold elements and data.
Again, he's got getter and setter methods, several parse methods that search for different things, and a general parse method that uses the other parse methods(sort of combines them here). Some of these methods make calls to the XMLObject class's methods, and send the parsed element and data values to their respective arrays.
The third class he has is one that has no fields, and has two methods, one for doing ATOI and one for dumping a portion of the file stream to the console.
I know we're essentially building a less efficient version of what's already included in the .net framework. I've pointed this out to him and was told "do not use .net's XML class, end of discussion" so let's all agree to just leave that one alone.
My question is, should those really be 3 separate classes. Shouldn't the parsing class either inherit from the XML object class, or just be coded in the XML object class, and shouldn't the ATOI and dumping methods be in one of those two classes as well?
It makes sense to me that if the parsing class's aim in life is to parse an XML file and store elements and data fields to an array, it should be in the same class rather than being isolated and having to do it through getters and setters(or properties in the version I'm going to do). I don't see why the arrays would need to be encapsulated away from the parse methods that actually give them what to store.
Any help would be appreciated, as I'm still designing this, and want to do it at least as close to "proper"(I know it's a relative term) OOP form.
The general rule is that we count the size of a class in the number of responsibilities that it has:
A class should have a single
responsibility: a single reason to
change.
It seems to me that your teacher did separate his responsibilities correctly. He separated the presentation from the xml parsing logic, and he separated the xml data from the xml parsing behavior.
First: If you're in a programming class, there may be a good reason he wants you to do this by hand: I really don't recommend arguing with your professors. You'll never win, and you can hurt your grades.
Second: His version is not (considering the fact that it is largely a re-writing of parts of the System.XML namespace) too terrible. Basically you have one class that "Is" your XML. Think of it like the XDocument or XmlDocument classes: Basically it just contains the Xml itself. Then You have your Xml Parser: think of that like XmlReader. And your last one is sort of his equivalent of XmlWriter.
Remember that with OOP, your Xml class (the one that represents the document itself) should neither know nor care how it came into possession of the information it has. Further, the Parser should know how to get the Xml, but it shouldn't much care where it gets stored. Finally, your Writer class shouldn't really care where the data is coming from, only where it's going.
I know it's over-used, but think of your program like a car- it has several parts that all have to work together, but you should be able to change any given part of it without majorly affecting the other pieces. If you lump everything in one class, you lose that flexibility.
Some points:
Classes are nouns; methods are verbs.
Your class should be called XmlParser.
Since the XML parser is neither part of the XMLObject nor extends the XMLObject, it should be a separate class.
The third class has nothing to do with either of the other two; it's just an ordinary Utilities class.
In general, each class should be responsible for a single unit of work or storage.
Don't try to put too much into a single class (see the "God object" anti-pattern).
There's nothing wrong with having lots of classes. (As long as they all make sense)
Let's summarize what the system must do :
to read in an xml file, byte by byte,
store element tags to one array,
the data values to another.
I would probably slice it up in the following way:
Reader : Given a file path, yields the contents byte-wise (IEnumerable<byte>)
Tokenizer: Given an enumeration of bytes, yields tokens relevant to the XML-Context (IEnumerable<XmlToken>)
XmlToken : Base class to any output that the tokenizer produces. For now you need 2 specializations :
Tag : An opening tag
Value : Contents of a tag
TokenDelegator : Accepts a Tokenizer and an instance of
IXmlTokenVisitor: (See Visitor pattern)
TagAndValueStore: Implements IXmlTokenVisitor. Visit(Tag tag) and Visit(Value value) are implented and the relevant content stored in arrays.
You see, I ended up with 7 classes and 1 interface. But you may notice that you have laid the foundations for a fully-fledged XML parser.
Often code that is sold to be OO just plain isn't. A class should adhere to the Single-Responsibility principle.

Working around missing MI in C#

I have a some code that gets passed a class derived from a certain class. Let's call this a parameter class.
The code uses reflection to walk the class' members and analyze certain custom attributes given to them. Basically, it's a configurable parser which will analyze input according to the attributes and put what it found into the data members.
This is used in several places in our code. You specify the parameter class, putting in attributed data members, and pass this to the parser. Something like this:
public class MyFancyParameters : ParametersBase
{
[SomeAttribute(Name="blah", AnotherParam=true)]
public string Blah { get; set; }
// .. .more such stuff
}
var parameters = new MyFancyParameters();
Parser.Parse(input, parameters);
In many places there are similar groups of attributed data members that need to get parsed. So the parameter classes are, in some places, similar. That's redundant and that, of course, hurts. Whenever I need a change in such an area, I need to make that change in half a dozen places, all clones. It's just a matter of time when these parts will start drift apart.
However, the similarities cannot be grouped in acyclic graphs, so I can't use single inheritance to group them.
What I would do in C++ is to put these chunks of similar stuff into their own classes, just inherit a bunch of them that contain whatever I need, and be done. (I think that's referred to as mix-in inheritance.)
C#, however, doesn't have multiple inheritance. So I was thinking of putting these chunks into data members and change the parser to recurse into data members. But that would considerably complicate the parser.
What else is there?
Can you have your parser accept a collection of parameter classes instead of a single parameter class? Alternately, you could allow the parser to recurse into your parameter class and have it supply additional parameter classes as properties. Basically, every property of a ParametersBase derived class that inherits from type ParametersBase is recursed into and flattened into a single list of parameters.
Actually, I just saw that you already mentioned the recursive solution. I think this is probably your best bet and it's not too complex to support. You should be able to create a helper function for enumerating the parameter properties that makes a hierarchy look like a flat class.
Here's some code that would provided a 'flattened' view of your properties, if I understand your requirement correctly. You'll probably want to augment the production code with additional safeguards (such as keeping a stack of types to detect circular references.)
public class ParametersParser
{
public static IEnumerable<PropertyInfo> GetAllParameterProperties(Type parameterType)
{
foreach (var property in parameterType.GetProperties())
{
if (Attribute.IsDefined(property, typeof(SomeAttribute)))
yield return property;
if (typeof(ParametersBase).IsAssignableFrom(property.PropertyType))
{
foreach (var subProperty in GetAllParameterProperties(property.PropertyType))
yield return subProperty;
}
}
}
}

How to use XmlSerializer to deserialize into an existing instance?

Is it somehow possible to use the XmlSerializer to deserialize its data into an existing instance of a class rather than into a new one?
This would be helpful in two cases:
Easily merge two XML files into one object instance.
Let object constructer itself be the one who is loading its data from the XML file.
If the is not possible by default it should work by using reflection (copying each property after the deserialisation) but this would be an ugly solution.
Basically, you can't. XmlSerializer is strictly constructive. The only interesting thing you can do to customize XmlSerializer is to implement IXmlSerializable and do everything yourself - not an attractive option (and it will still create new instances with the default constructor, etc).
Is xml a strict requirement? If you can use a different format, protobuf-net supports merging fragments into existing instances, as simply as:
Serializer.Merge(source, obj);
I think you're on the right track with the Reflection idea.
Since you probably have a wrapper around the XML operations anyway, you could take in the destination object, do the deserialization normally into a new object, then do something similar to cloning by copying over one by one only the properties holding non-default values.
It shouldn't be that complex to implement this, and it would look to consumers from the rest of your application just like in-place deserialization.
I hit the same problem a few weeks ago.
I put a method Deserialize(string serialized form) in the ISelfSerializable interface that an entity class of mine implemented. I also made sure the interface forced the class to have a default constructor.
In my factory I created an object of that type and then deserialized the string into it.
This is not thread safe thing to do... But you can do:
[Serializable]
public class c_Settings
{
static c_Settings Default;
public static SetExistingObject(c_Settings def)
{
Default = def;
}
public string Prop1;
public bool Prop2;
public c_Settings()
{
if (Default == null)
return;
MemberInfo[] members = FormatterServices.GetSerializableMembers(typeof(c_Settings));
FormatterServices.PopulateObjectMembers(this, members, FormatterServices.GetObjectData(Default, members));
}
}
This way you feed your object to deserialiser and deserialiser only overwrites whatever is written in .xml.

Regarding Passing Many Parameters

I have around 8-9 parameters to pass in a function which returns an array. I would like to know that its better to pass those parameters directly in the function or pass an array instead? Which will be a better way and why?
If I would do anything, then it would be to create an structure that holds all parameters to get nice intellisence and strong names.
public struct user
{
public string FirstName;
public string LastName;
public string zilionotherproperties;
public bool SearchByLastNameOnly;
}
public user[] GetUserData(user usr)
{
//search for users using passed data and return an array of users.
}
Pass them individually, because:
that is the type-safe way.
IntelliSense will pick it up in Visual Studio and when you write your calling functions, you will know what's what.
It is faster to execute that way.
If the parameter really IS the array, though, then pass the array. Example:
For functions which look like this, use this notation:
Array FireEmployee(string first, string middle, string last, int id) {...}
For functions that look like this, use the array:
Array FireEmployees(Employee[] unionWorkers) {...}
Your scenario is covered by the Introduce Parameter Object refactoring in Martin Fowler's refactoring book. The book is well worth owning, but for those who don't, the refactoring is described here. There's also a preview on the publisher's site, and on Google books. It recommends replacing the parameters not with an array, but a new object.
Regarding Skeets comment on my example above that he would use a class instead of a structure and maybe make it clearer where to use a class and where to use a structure i post this too. I think there are other out there who are curious about this too.
The main reason to use a class as I could see was you could make it immutable, but thats possible with structures too?
for example:
struct user
{
public user(string Username, string LastName)
{
_username = Username;
}
private string _username;
public string UserName {
get { return _username; }
}
}
I have long time felt that I dont know the differences anymore between classes and structures now when we can have propertys, initializers, fields and exactly everything that a class has in a structure too. I know classes are refernce types and structures are value types but what difference does it make in the case above when using it as a parameter in a function?
I found this description of the differences on the site http://www.startvbdotnet.com/oop/structure.aspx and that description is exactly how I mapped it in my head:
Structures can be defined as a tool
for handling a group of logically
related data items. They are
user-defined and provide a method for
packing together data of different
types. Structures are very similar to
Classes. Like Classes, they too can
contain members such as fields and
methods. The main difference between
classes and structures is, classes are
reference types and structures are
value types. In practical terms,
structures are used for smaller
lightweight objects that do not
persist for long and classes are used
for larger objects that are expected
to exist in memory for long periods.
Maybe this should be a own question but I felt it was related when we all had different views on the structure vs class-thing as parameter.
I assume you're using C# 4 and can just use named parameters:
FireEmployee(
first: "Frank",
middle: "",
last: "Krueger",
id: 338);
These make the code almost as readable as VB or Smalltalk. :-)
If not, I would go with what Dave Markle has to say.
If this is library code that will see a lot of use, and if some of the parameters have typical values that are candidates for default values, then you should consider Dave Markle's advice, and provide a selectio of overloads with progressively fewer parameters. This is the approach recommended in the Microsoft Framework Design Guidelines.
Alternately, you can get a similar effect with Stefan's approach, by setting default values with member initializers and using a progression of ctor overloads.
If you really don't want to pass in your arguments separately I would suggest creating a new class which encapsulates all of your arguments. You can (in Java and most likely in C#) declare a public inner class inside the class containing the gnarly method for this purpose. This avoids having classes floating around which are really just helper types.
I would say pass them individually as well. I don't like the idea of creating a class, then passing that class through as an argument. Its a form of stamp coupling, which means making changes will be harder since one class uses the other. And reusing one class means you would have to reuse the other as well.
You could use an interface to reduce stamp coupling, but that's too much overhead for my tastes, so that's why I like to pass the arguments individually.
Do you really need 8-9 parameters for a single function? It seems to me that if you need that many parameters, then you're probably doing too many different things in that function. Try refactoring the code into separate functions so that each function has exactly one purpose.
Do not pass them as an array unless the function acts on an array, I wouldn't create a new data structure either to group the parameters for the following reasones
Passing a new data structure hides what the function really needs as input (does it need all the data structure/part of it?)
Related to 1 it makes UTs more difficult (when writing a UT you need to recreate the entire data structure)
If the input parameters are not related you end up with a new data structure that groups unrelated data types for no other reason than to make a function call look neater
If you chose to pass the new data structure to your function the function can not be used in a scope where the new datastructure was defined
Really the only disadvantage to passing each paramater to the function is that you might not be able to fit the function in one line of code, but don't forget the lines you need before the function call in which you will fill up your data structure.

Categories