Messages types with DataFlow Blocks

Messages types with DataFlow Blocks - c#

I am self-training on the TPL-Dataflow, and I have read that using immutable objects for messages is the way to go.
To comply with this, I have designed specific classes for every block inputs and outputs.
Unfortunately, when I link my block each other, because the blocks input and output types are very different, it leads to a proliferation of TransformBlock:
var proc1 = new TransformBlock<proc1In,proc1Out>(...
var convertOut1toIn2 = new TransformBlock<proc1Out,proc2In>(p1 => new proc2In { ...
var proc2 = TransformBlock<proc2In,proc2Out>(...
proc1.LinkTo(convertOut1ToIn2);
convertOut1ToIn2.LinkTo(proc2);
Using Batch and Join blocks later to merge results together makes me struggle with a very messy code.
Every samples I read on the internet uses simple types as int, string... I have not found anything that deals with a bit more complex types.
I feel the urge to use single big object and pass its reference through all the blocks. Before doing this mistake, I would like to know if there is some better way to do.

After some time musing with TPL-Dataflow, it turns out that:
Envisioning Dataflow as a conveyor belt carrying manufacturing items towards different working stations where items are enriched and built is completely wrong: doing this way leads to excruciating hard concurrency issues. Dataflow is a messaging system.
Instead, I feel it better picturing it as a mesh of people who deals with external facilities to make things (IO, Database persistence, CalculationEngines...)
The problem of messages types I dealt with is easly circumvented using Tuples. In general I dislike Tuples uglyness, but in this very situation, I feel like they really fits this place.
My problem is multiple picture analysis. Instead of having Blocks passing a "Workitem" object each other and mess with it, I rather use an separate "WorkItemSupplier" class instead. This class uses a ConcurrentDictionary of WorkItems and exposes methods to deals with workitems.
This way, my blocks in Dataflow only passes the ID of a workitem each other, so they can use the WorkItemSupplier as an external facility to store/retrieves, or change the state of any workitem.
By this way, code is running way smoothier, well separated and easier to read.

Related

Maintaining modularity in Main()?

I'm writing the simple card game "War" for homework and now that the game works, I'm trying to make it more modular and organized. Below is a section of Main() containing the bulk of the program. I should mention, the course is being taught in C#, but it is not a C# course. Rather, we're learning basic logic and OOP concepts so I may not be taking advantage of some C# features.
bool sameCard = true;
while (sameCard)
{
sameCard = false;
card1.setVal(random.Next(1,14)); // set card value
val1 = determineFace(card1.getVal()); // assign 'face' cards accordingly
suit = suitArr[random.Next(0,4)]; // choose suit string from array
card1.setSuit(suit); // set card suit
card2.setVal(random.Next(1,14)); // rinse, repeat for card2...
val2 = determineFace(card2.getVal());
suit = suitArr[random.Next(0,4)];
card2.setSuit(suit);
// check if same card is drawn twice:
catchDuplicate(ref card1, ref card2, ref sameCard);
}
Console.WriteLine ("Player: {0} of {1}", val1, card1.getSuit());
Console.WriteLine ("Computer: {0} of {1}", val2, card2.getSuit());
// compare card values, display winner:
determineWinner(card1, card2);
So here are my questions:
Can I use loops in Main() and still consider it modular?
Is the card-drawing process written well/contained properly?
Is it considered bad practice to print messages in a method (i.e.: determineWinner())?
I've only been programming for two semesters and I'd like to form good habits at this stage. Any input/advice would be much appreciated.
Edit:
catchDuplicate() is now a boolean method and the call looks like this:
sameCard = catchDuplicate(card1, card2);
thanks to #Douglas.

Can I use loops in Main() and still consider it modular?
Yes, you can. However, more often than not, Main in OOP-programs contains only a handful of method-calls that initiate the core functionality, which is then stored in other classes.
Is the card-drawing process written well/contained properly?
Partially. If I understand your code correctly (you only show Main), you undertake some actions that, when done in the wrong order or with the wrong values, may not end up well. Think of it this way: if you sell your class library (not the whole product, but only your classes), what would be the clearest way to use your library for an uninitiated user?
I.e., consider a class Deck that contains a deck of cards. On creation it creates all cards and shuffles it. Give it a method Shuffle to shuffle the deck when the user of your class needs to shuffle and add methods like DrawCard for handling dealing cards.
Further: you have methods that are not contained within a class of their own yet have functionality that would be better of in a class. I.e., determineFace is better suited to be a method on class Card (assuming card2 is of type Card).
Is it considered bad practice to print messages in a method (i.e.: determineWinner())?
Yes and no. If you only want messages to be visible during testing, use Debug.WriteLine. In a production build, these will be no-ops. However, when you write messages in a production version, make sure that this is clear from the name of the method. I.e., WriteWinnerToConsole or something.
It's more common to not do this because: what format would you print the information? What text should come with it? How do you handle localization? However, when you write a program, obviously it must contain methods that write stuff to the screen (or form, or web page). These are usually contained in specific classes for that purpose. Here, that could be the class CardGameX for instance.
General thoughts
Think about the principle "one method/function should have only one task and one task only and it should not have side effects (like calculating square and printing, then printing is the side effect).".
The principle for classes is, very high-level: a class contains methods that logically belong together and operate on the same set of properties/fields. An example of the opposite: Shuffle should not be a method in class Card. However, it would belong logically in the class Deck.

If the main problem of your homework is create a modular application, you must encapsulate all logic in specialized classes.
Each class must do only one job.
Function that play with the card must be in a card class.
Function that draw cards, should be another class.
I think it is the goal of your homework, good luck!

Take all advices on "best practices" with a grain of salt. Always think for yourself.
That said:
Can I use loops in Main() and still consider it modular?
The two concepts are independent. If your Main() only does high-level logic (i.e. calls other methods) then it does not matter if it does so in a loop, after all the algorithm requires a loop. (you wouldn't add a loop unnecessarily, no?)
As a rule of thumb, if possible/practical, make your program self-documenting. Make it "readable" so, if a new person (or even you, a few months from now) looks at it they can understand it at any level.
Is the card-drawing process written well/contained properly?
No. First of all, a card should never be selected twice. For a more "modular" approach I would have something like this:
while ( Deck.NumCards >= 2 )
{
Card card1 = Deck.GetACard();
Card card2 = Deck.GetACard();
PrintSomeStuffAboutACard( GetWinner( card1, card2 ) );
}
Is it considered bad practice to print messages in a method (ie: determineWinner())?
Is the purpose of determineWinner to print a message? If the answer is "No" then it is not a matter of "bad practice", you function is plain wrong.
That said, there is such a thing as a "debug" build and a "release" build. To aid you in debugging the application and figuring out what works and what doesn't it is a good idea to add logging messages.
Make sure they are relevant and that they are not executed in the "release" build.

Q: Can I use loops in Main() and still consider it modular?
A: Yes, you can use loops, that doesn't really have an impact on modularity.
Q: Is the card-drawing process written well/contained properly?
A: If you want to be more modular, turn DrawCard into a function/method. Maybe just write DrawCards instead of DrawCard, but then there's an optimization-versus-modularity question there.
Q: Is it considered bad practice to print messages in a method (ie: determineWinner())?
A: I wouldn't say printing messages in a method is bad practice, it just depends on context. Ideally, the game itself doesn't handle anything but game logic. The program can have some kind of game object and it can read state from the game object. This way, you could technically change the game from being text-based to being graphical. I mean, that's ideal for modularity, but it may not be practical given a deadline. You always have to decide when you have to sacrifice a best practice because there isn't enough time. Sadly, this is all too often a common occurrence.
Separate game logic from the presentation of it. With a simple game like this, it's an unnecessary dependency.

Oop data structure advice

I am writing a log file decoder which should be capable of reading many different structures of files. My question is how best to represent this data. I am using C#, but am new to OOP.
An example:
The log files have a range of sensor values. One sensor reading can be called A, another B. Obviously, there are many more than 2 entry types.
In different log files, they could be stored either as ABABABABAB or AAAAABBBBB.
I was thinking of describing this as blocks of entries. So in the first case, a block would be 'AB', with 5 blocks. In the second case, the first block is 'A', read 5 times. This is followed by a block of 'B', read 5 times.
This is quite a simplification (there are actually 40 different types of log file, each with up to 40 sensor values in a block). No log has more than 300 blocks.
At the moment, I store all of this in a datatable. I have a column for each entry, with a property of how many to read. If this is set to -1, it continues to the next column in the block. If not, it will assume that it has reached the end of the block.
This all seems quite clumsy. Can anyone suggest a better way of doing this?

I think you should first start here, and then here to learn a little bit about what object oriented programming is. Don't worry about your current problem while learning about OOP.
As you are learning about OO concepts, you should begin to understand code is not data, and data is not code. It does not matter how you represent your data from an OOP stance. You can write OO code to consume your data, or you could write procedurage code to consume your data, that part is irrelevant to the format of the data.
So then getting back to your question
My question is how best to represent this data
It depends on your needs. What is writing the log file? Do you have control over the writer and reader? If I did I would rely on build the built in serialization methods to minize the amount of code I need to write. Is the log file going to be really long? If so the "datatable" approach you described is usually better. If the log file isn't going to be a huge in file size, XML is really easy to work with.

Very basic and straightforward:
Define an interface for IEnrty with properties like string EntryBlock, int Count
Define a class which represents an Entry and implements IEntry
Code which doing a binary serialization should be aware of interfaces, for instance it should reffer IEnumerable<IEntry>
Class Entry could override ToString() to return something like [ABAB-2], surely if this is would be helpful whilst serialization
Interface IEntry could provide method void CreateFromRawString(string rawDataFromLog) if it would be helpful, decide yourself
If you want more info please share code you are using for serialization/deserializaton

In addition to what Bob has offered, I highly recommend Head First Design Patterns as a gentle, but robust introduction to OO for a C# programmer. The samples are in Java, which translate easily to C#.

As for OOP, you want to learn SOLID.
I would suggest you build this using Test Driven Development.
Start small, with a simple fragment of your log data and write a test like (you'll find a better way to do this with experience and apply it to your situation):
[Test]
public void ReadSequence_FiveA_ReturnsProperList()
{
// Arrange
string sequenceStub = "AAAAA";
// Act
MyFileDecoder decoder = new MyFileDecoder();
List<string> results = decoder.ReadSequence(sequenceStub);
// Assert
Assert.AreEqual(5, results.Count);
Assert.AreEqual("A", results[0]);
}
That test code snippet is just a starting point, and I've tried to be rather verbose in the assertions. You can come up with more creative ways over time. The point is to start small. Once this test passes, add another test where you mix "AB" and change your decoder to handle this properly. Eventually, you'll have a large set of tests that handle your different formats. Using TDD, you'll be on the path to using SOLID properly. Whenever you find something you can't test, you should review the rules and see if you can't make it simpler and inject dependencies.
Eventually you'll get into mocking. For example, you might find that you'd rather INJECT the ability for your MyFileDecoder class to have a dependency that will read your log file. In that case, you would create a mock object and pass that into the constructor and set the mock to return the sequenceStub when a method is called.

Declaring and creating an object then adding to collection VS Adding object to collection using new keyword to create object

Ok so the title may have been confusing so i have posted 2 code snippets to illustrate what i mean.
NOTE: allUsers is just a collection.
RegularUser regUser = new RegularUser(userName, password, name, emailAddress);
allUsers.Add(regUser);
VS
allUsers.Add(new RegularUser(userName, password, name, emailAddress));
Which snippet A or B is better and why?
What are the advantages or disadvantages?
The example i wrote was C# but does the language (C#, Java etc) make a difference?

As far as C# is concerned, both of your code examples are practically identical at the IL level. The second examples still creates a reference to the created object and pushes it onto the stack, you just don't have a local variable hooked up to it. This will not create any performance problems at all.

1) Which snippet A or B is better and why?
They're really identical. The compiled code will be nearly identical, since a temporary object is pushed onto the stack, then used in the method call.
2) What are the advantages or disadvantages?
The main advantages and disadvantages to the approach are really just readability.
Your first example has the advantage of keeping a single "operation" per line of code, which, in many ways, is more maintainable.
The second example removes the unnecessary variable declaration, which may be more maintainable.
Personally, I feel that the number of parameters in your RegularUser constructor would probably push me, in this instance, towards your first option. I typically find that, when a line of code gets to be more than about half a screen width on an average monitor, it's easier to read and understand if it's split up. Splitting this up by introducing the temporary and calling Add separately makes this more clear.
However, if you're just adding an integer or a class that's very small, I'd probably vote to skip the unnecessary variable. This is completely a personal preference, however - your milage may (and probably will) vary.
3) The example i wrote was C# but does the language (C#, Java etc) make a difference?
No, for the most part. This is really language/implementation dependent, but most languages will have the same basic behavior and performance in both cases. It is possible (and highly likely) that some languages may treat this differently, but most mainstream languages will not.

I really like to create them the first way unless I really really know what is going on. It is much harder to do debugging if you don't create the object first...
The compiler will just turn the 2nd version into the 1st for you, anyway, so there isn't a net negative effect.
Pros of #1:
easier to debug (!)
theoretically easier to read, clearer
can use the object later
Cons:
more verbose
can be unnecessary, especially for trivial objects
Result:
1 for anything complex to create, or that may need to be inspected easily at debug time
2 for lots of annoying little stuff, like the following.
var list = new List<NameValuePair>(3);
list.Add( new NameValuePair("name", "valuable");
list.add( new NameValuePair("age", "valuable");
list.add( new NameValuePair("height", "not valuable");
var dates = new List<date>();
dates.Add(DateTime.Now());
dates.Add(DateTime.Now().Date().AddYears(-2));
As far as I know there isn't a real difference between languages when it comes to this. Some may not allow it, though.

Both are equal in terms of performance.
In terms of maintainability the second case is a nightmare, it is (nearly) impossible to trace in a debugger. So I tend to prefer the first one. In my early oop days I was always writing the second, because "I knew that they were objects and I was sooo good at grasping objects that I ... blah blah blah", but that wore off with time and especially maintenance time
Also, suppose that someone wants you to
FilterClass.FilterUser(regUser)
or
Database.AddToDatabase(regUser)
because it is the right place to do so, the first scenario is better.
Finally, when do you stop?
allUsers.Add(new RegularUser(new ReadFromInput(new EscapedName(new Name(new String(userName)))), password, name, emailAddress));

Thread Safe Class Library Design

I'm working on a class library and have opted for a route with my design to make implementation and thread safety slightly easier, however I'm wondering if there might be a better approach.
A brief background is that I have a multi-threaded heuristic algorithm within a class library, that once set-up with a scenario should attempt to solve it. However I obviously want it to be thread safe and if someone makes a change to anything while it is solving for that to causes crashes or errors.
The current approach I've got is if I have a class A, then I create a number InternalA instances for each A instance. The InternalA has many of the important properties from the A class, but is internal an inaccessible outside the library.
The downside of this, is that if I wish to extend the decision making logic (or actually let someone do this outside the library) then it means I need to change the code within the InternalA (or provide some sort of delegate function).
Does this sound like the right approach?

It's hard to really say from just that - but I can say that if you can make everything immutable, your life will be a lot easier. Look at how functional languages approach immutable data structures and collections. The less shared mutable data you have, the simple threading will be.

Why Not?
Create generic class, that accepts 2 members class (eg. Lock/Unlock) - so you could provide
Threadsafe impl (implmenetation can use Monitor.Enter/Exit inside)
System-wide safe impl (using Mutex)
Unsafe, but fast (using empty impl).

another way i have had some success with is by using interfaces to achieve functional separation. the cost of this approach is that you end up with some fields 'repeated' because each interface requires total separation from the others fields.
In my case I had 2 threads that need to pass over a set of data that potentially is large and needs as little garbage collection as possible. Ie I only want to pass change information from the first stage to the second. And then have the first process the next work unit.
this was achieved by the use of change buffers to pass changes from one interface to the next.
this allows one thread to work away at one interface, make all its changes and then publish a struct containing the changes that the other interface (thread) needs to apply prior to its work.
by doing this You have a double buffer ... (thread 1 produces a change report whilst thread 2 consumes the last report). If you add more interfaces (and threads) it appears like there are pulses of work moving through the threads.
This was based on my research and I have no doubt that there are better methods available now.
My aim when coming up with this however was to avoid the need for locks in the vast majority of code by designing out race conditions. the other major consideration is performance in garbage collection - which may not be an issue for you.
this way is all good until you need complex interactions between threads ... then you find that you start forcing the layout of your buffer structures for reuse to get around inheritance which in turn has an upkeep overhead.

A little more information on the problem to help...
The heuristic I'm using is to solve TSP like problems. What happens right at the start of each
calculation is that all the aspects that form the problem (sales man/places to visit) are cloned
so they aren't affected across threads.
This means each thread can change data (such as stock left on a sales man etc) as there are a number
of values that change during the calculation as things progress. What I'd quite like to do is allow
the checked such as HasSufficientStock() for a simple example to be override by a developer using the library.
Unforutantely at present however to add further protection across threads and makings some simplier/lightweight
classes I convert them to these internal classes, and these are the things that are actually used and cloned.
For example
class A
{
public double Stock { get; }
// Processing and cloning actually works using these InternalA's
internal InternalA ConvertToInternal() {}
}
internal class InternalA : ICloneable
{
public double Stock { get; set; }
public bool HasSufficientStock() {}
}

Large Switch statements: Bad OOP?

I've always been of the opinion that large switch statements are a symptom of bad OOP design. In the past, I've read articles that discuss this topic and they have provided altnerative OOP based approaches, typically based on polymorphism to instantiate the right object to handle the case.
I'm now in a situation that has a monsterous switch statement based on a stream of data from a TCP socket in which the protocol consists of basically newline terminated command, followed by lines of data, followed by an end marker. The command can be one of 100 different commands, so I'd like to find a way to reduce this monster switch statement to something more manageable.
I've done some googling to find the solutions I recall, but sadly, Google has become a wasteland of irrelevant results for many kinds of queries these days.
Are there any patterns for this sort of problem? Any suggestions on possible implementations?
One thought I had was to use a dictionary lookup, matching the command text to the object type to instantiate. This has the nice advantage of merely creating a new object and inserting a new command/type in the table for any new commands.
However, this also has the problem of type explosion. I now need 100 new classes, plus I have to find a way to interface them cleanly to the data model. Is the "one true switch statement" really the way to go?
I'd appreciate your thoughts, opinions, or comments.

You may get some benefit out of a Command Pattern.
For OOP, you may be able to collapse several similar commands each into a single class, if the behavior variations are small enough, to avoid a complete class explosion (yeah, I can hear the OOP gurus shrieking about that already). However, if the system is already OOP, and each of the 100+ commands is truly unique, then just make them unique classes and take advantage of inheritance to consolidate the common stuff.
If the system is not OOP, then I wouldn't add OOP just for this... you can easily use the Command Pattern with a simple dictionary lookup and function pointers, or even dynamically generated function calls based on the command name, depending on the language. Then you can just group logically associated functions into libraries that represent a collection of similar commands to achieve manageable separation. I don't know if there's a good term for this kind of implementation... I always think of it as a "dispatcher" style, based on the MVC-approach to handling URLs.

I see having two switch statements as a symptom of non-OO design, where the switch-on-enum-type might be replaced with multiple types which provide different implementations of an abstract interface; for example, the following ...
switch (eFoo)
{
case Foo.This:
eatThis();
break;
case Foo.That:
eatThat();
break;
}
switch (eFoo)
{
case Foo.This:
drinkThis();
break;
case Foo.That:
drinkThat();
break;
}
... should perhaps be rewritten to as ...
IAbstract
{
void eat();
void drink();
}
class This : IAbstract
{
void eat() { ... }
void drink() { ... }
}
class That : IAbstract
{
void eat() { ... }
void drink() { ... }
}
However, one switch statement isn't imo such a strong indicator that the switch statement ought to be replaced with something else.

The command can be one of 100 different commands
If you need to do one out of 100 different things, you can't avoid having a 100-way branch. You can encode it in control flow (switch, if-elseif^100) or in data (a 100-element map from string to command/factory/strategy). But it will be there.
You can try to isolate the outcome of the 100-way branch from things that don't need to know that outcome. Maybe just 100 different methods is fine; there's no need to invent objects you don't need if that makes the code unwieldy.

I think this is one of the few cases where large switches are the best answer unless some other solution presents itself.

I see the strategy pattern. If I have 100 different strategies...so be it. The giant switch statement is ugly. Are all the Commands valid classnames? If so, just use the command names as class names and create the strategy object with Activator.CreateInstance.

There are two things that come to mind when talking about a large switch statement:
It violates OCP - you could be continuously maintaining a big function.
You could have bad performance: O(n).
On the other hand a map implementation can conform to OCP and could perform with potentially O(1).

I'd say that the problem is not the big switch statement, but rather the proliferation of code contained in it, and abuse of wrongly scoped variables.
I experienced this in one project myself, when more and more code went into the switch until it became unmaintainable. My solution was to define on parameter class which contained the context for the commands (name, parameters, whatever, collected before the switch), create a method for each case statement, and call that method with the parameter object from the case.
Of course, a fully OOP command dispatcher (based on magic such as reflection or mechanisms like Java Activation) is more beautiful, but sometimes you just want to fix things and get work done ;)

You can use a dictionary (or hash map if you are coding in Java) (it's called table driven development by Steve McConnell).

One way I see you could improve that would make your code driven by the data, so for example for each code you match something that handles it (function, object). You could also use reflection to map strings representing the objects/functions and resolve them at run time, but you may want to make some experiments to assess performance.

The best way to handle this particular problem: serialization and protocols cleanly is to use an IDL and generate the marshaling code with switch statements. Because whatever patterns (prototype factory, command pattern etc.) you try to use otherwise, you'll need to initialize a mapping between a command id/string and class/function pointer, somehow and it 'll runs slower than switch statements, since compiler can use perfect hash lookup for switch statements.

Yes, I think large case statements are a symptom that one's code can be improved... usually by implementing a more object oriented approach. For example, if I find myself evaluating the type of classes in a switch statement, that almost always mean I could probably use Generics to eliminate the switch statement.

You could also take a language approach here and define the commands with associated data in a grammar. You can then use a generator tool to parse the language. I have used Irony for that purpose. Alternatively you can use the Interpreter pattern.
In my opinion the goal is not to build the purest OO model, but to create a flexible, extensible, maintainable and powerful system.

I have recently a similar problem with a huge switch statement and I got rid off the ugly switch by the most simple solution a Lookup table and a function or method returning the value you expect. the command pattern is nice solution but having 100 classes is not nice I think.
so I had something like:
switch(id)
case 1: DoSomething(url_1) break;
case 2: DoSomething(url_2) break;
..
..
case 100 DoSomething(url_100) break;
and I've changed for :
string url = GetUrl(id);
DoSomthing(url);
the GetUrl can go to DB and return the url you are looking for, or could be a dictionary in memory holding the 100 urls.
I hope this could help anyone out there when replacing a huge monstrous switch statements.

Think of how Windows was originally written in the application message pump. It sucked. Applications would run slower with the more menu options you added. As the command searched for ended further and further towards the bottom of the switch statement, there was an increasingly longer wait for response. It's not acceptable to have long switch statements, period. I made an AIX daemon as a POS command handler that could handle 256 unique commands without even knowing what was in the request stream received over TCP/IP. The very first character of the stream was an index into a function array. Any index not used was set to a default message handler; log and say goodbye.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.