Thread Safe Class Library Design

Thread Safe Class Library Design - c#

I'm working on a class library and have opted for a route with my design to make implementation and thread safety slightly easier, however I'm wondering if there might be a better approach.
A brief background is that I have a multi-threaded heuristic algorithm within a class library, that once set-up with a scenario should attempt to solve it. However I obviously want it to be thread safe and if someone makes a change to anything while it is solving for that to causes crashes or errors.
The current approach I've got is if I have a class A, then I create a number InternalA instances for each A instance. The InternalA has many of the important properties from the A class, but is internal an inaccessible outside the library.
The downside of this, is that if I wish to extend the decision making logic (or actually let someone do this outside the library) then it means I need to change the code within the InternalA (or provide some sort of delegate function).
Does this sound like the right approach?

It's hard to really say from just that - but I can say that if you can make everything immutable, your life will be a lot easier. Look at how functional languages approach immutable data structures and collections. The less shared mutable data you have, the simple threading will be.

Why Not?
Create generic class, that accepts 2 members class (eg. Lock/Unlock) - so you could provide
Threadsafe impl (implmenetation can use Monitor.Enter/Exit inside)
System-wide safe impl (using Mutex)
Unsafe, but fast (using empty impl).

another way i have had some success with is by using interfaces to achieve functional separation. the cost of this approach is that you end up with some fields 'repeated' because each interface requires total separation from the others fields.
In my case I had 2 threads that need to pass over a set of data that potentially is large and needs as little garbage collection as possible. Ie I only want to pass change information from the first stage to the second. And then have the first process the next work unit.
this was achieved by the use of change buffers to pass changes from one interface to the next.
this allows one thread to work away at one interface, make all its changes and then publish a struct containing the changes that the other interface (thread) needs to apply prior to its work.
by doing this You have a double buffer ... (thread 1 produces a change report whilst thread 2 consumes the last report). If you add more interfaces (and threads) it appears like there are pulses of work moving through the threads.
This was based on my research and I have no doubt that there are better methods available now.
My aim when coming up with this however was to avoid the need for locks in the vast majority of code by designing out race conditions. the other major consideration is performance in garbage collection - which may not be an issue for you.
this way is all good until you need complex interactions between threads ... then you find that you start forcing the layout of your buffer structures for reuse to get around inheritance which in turn has an upkeep overhead.

A little more information on the problem to help...
The heuristic I'm using is to solve TSP like problems. What happens right at the start of each
calculation is that all the aspects that form the problem (sales man/places to visit) are cloned
so they aren't affected across threads.
This means each thread can change data (such as stock left on a sales man etc) as there are a number
of values that change during the calculation as things progress. What I'd quite like to do is allow
the checked such as HasSufficientStock() for a simple example to be override by a developer using the library.
Unforutantely at present however to add further protection across threads and makings some simplier/lightweight
classes I convert them to these internal classes, and these are the things that are actually used and cloned.
For example
class A
{
public double Stock { get; }
// Processing and cloning actually works using these InternalA's
internal InternalA ConvertToInternal() {}
}
internal class InternalA : ICloneable
{
public double Stock { get; set; }
public bool HasSufficientStock() {}
}

Related

Keeping class state valid vs performance

If i have public method that returns a reference type value, which is private field in the current class, do i need to return a copy of it? In my case i need to return List, but this method is called very often and my list holds ~100 items. The point is that if i return the same variable, everybody can modify it, but if i return a copy, the performance will degrade. In my case im trying to generate sudoku table, which is not fast procedure.
Internal class SudokuTable holds the values with their possible values. Public class SudokuGame handles UI requests and generates/solves SudokuTable. Is it good practice to chose performance instead OOP principles? If someone wants to make another library using my SudokuTable class, he wont be aware that he can brake its state with modifying the List that it returns.

Performance and object-oriented programming are not mutually exclusive - your code can be object-oriented and perform badly, etc.
In the case you state here I don't think it would be wise to allow external parts edit the internal state of a thing, so I would return an array or ReadOnlyCollection of the entries (it could be a potential possibility to use an ObservableCollection and monitor for tampering out-of-bounds, and 'handling' that accordingly (say, with an exception or something) - unsure how desirable this would be).
From there, you might consider how you expose access to these entries, trying to minimise the need for callers to get the full collection when all they need is to look up and return a specific one.
It's worth noting that an uneditable collection doesn't necessarily mean the state cannot be altered, either; if the entries are represented by a reference type rather than a value type then returning an entry leaves that open to tampering (potentially, depending on the class definition), so you might be better off with structs for the entry types.
At length, this, without a concrete example of where you're having problems, is a bit subjective and theoretical at the moment. Have you tried restricting the collection? And if so, how was the performance? Where were the issues? And so on.

Delegates (Lambda expressions) Vs Interfaces and abstract classes

I have been looking for a neat answer to this design question with no success. I could not find help neither in the ".NET Framework design guidelines" nor in the "C# programing guidelines".
I basically have to expose a pattern as an API so the users can define and integrate their algorithms into my framework like this:
1)
// This what I provide
public abstract class AbstractDoSomething{
public abstract SomeThing DoSomething();
}
Users need to implementing this abstract class, they have to implement the DoSomething method (that I can call from within my framework and use it)
2)
I found out that this can also acheived by using delegates:
public sealed class DoSomething{
public String Id;
Func<SomeThing> DoSomething;
}
In this case, a user can only use DoSomething class this way:
DoSomething do = new DoSomething()
{
Id="ThisIsMyID",
DoSomething = (() => new Something())
}
Question
Which of these two options is best for an easy, usable and most importantly understandable to expose as an API?
EDIT
In case of 1 : The registration is done this way (assuming MyDoSomething extends AbstractDoSomething:
MyFramework.AddDoSomething("DoSomethingIdentifier", new MyDoSomething());
In case of 2 : The registration is done like this:
MyFramework.AddDoSomething(new DoSomething());

Which of these two options is best for an easy, usable and most importantly understandable to expose as an API?
The first is more "traditional" in terms of OOP, and may be more understandable to many developers. It also can have advantages in terms of allowing the user to manage lifetimes of the objects (ie: you can let the class implement IDisposable and dispose of instances on shutdown, etc), as well as being easy to extend in future versions in a way that doesn't break backwards compatibility, since adding virtual members to the base class won't break the API. Finally, it can be simpler to use if you want to use something like MEF to compose this automatically, which can simplify/remove the process of "registration" from the user's standpoint (as they can just create the subclass, and drop it in a folder, and have it discovered/used automatically).
The second is a more functional approach, and is simpler in many ways. This allows the user to implement your API with far fewer changes to their existing code, as they just need to wrap the necessary calls in a lambda with closures instead of creating a new type.
That being said, if you're going to take the approach of using a delegate, I wouldn't even make the user create a class - just use a method like:
MyFramework.AddOperation("ThisIsMyID", () => DoFoo());
This makes it a little bit more clear, in my opinion, that you're adding an operation to the system directly. It also completely eliminates the need for another type in your public API (DoSomething), which again simplifies the API.

I would go with the abstract class / interface if:
DoSomething is required
DoSomething will normally get really big (so DoSomething's implementation can be splited into several private / protected methods)
I would go with delegates if:
DoSomething can be treated as an event (OnDoingSomething)
DoSomething is optional (so you default it to a no-op delegate)

Though personally, if in my hand, I would always go by Delegate Model. I just love the simplicity and elegance of higher order functions. But while implementing the model, be careful about memory leaks. Subscribed events are one of the most common reasons of memory leaks in .Net. This means, suppose if you have an object that has some events exposed, the original object would never be disposed until all events are unsubscribed since event creates a strong reference.

As is typical for most of these types of questions, I would say "it depends". :)
But I think the reason for using the abstract class versus the lambda really comes down to behavior. Usually, I think of the lambda being used as a callback type of functionality -- where you'd like something custom happen when something else happens. I do this a lot in my client-side code:
- make a service call
- get some data back
- now invoke my callback to handle that data accordingly
You can do the same with the lambdas -- they are specific and are targeted for very specific situations.
Using the abstract class (or interface) really comes down to where your class' behavior is driven by the environment around it. What's happening, what client am I dealing with, etc.? These larger questions could suggest that you should define a set of behaviors and then allow your developers (or consumers of your API) to create their own sets of behavior based upon their requirements. Granted, you could do the same with lambdas, but I think it would be more complex to develop and also more complex to clearly communicate to your users.
So, I guess my rough rule of thumb is:
- use lambdas for specific callback or side-effect customized behaviors;
- use abstract classes or interfaces to provide a mechanism for object behavior customization (or at least the majority of the object's primary behavior).
Sorry I can't give you a clear definition, but I hope this helps. Good luck!

A few things to consider :
How many different functions/delegates would need to be over-ridden? If may functions, inheretance will group "sets" of overrides in an easier to understand way. If you have a single "registration" function, but many sub-portions can be delegated out to the implementor, this is a classic case of the "Template" pattern, which makes the most sense to be inherited.
How many different implementations of the same function will be needed? If just one, then inheretance is good, but if you have many implementations a delegate might save overhead.
If there are multiple implementations, will the program need to switch between them? Or will it only use a single implementation. If switching is required, delegates might be easier, but I would caution this, especially depending on the answer to #1. See the Strategy Pattern.
If the override needs access to any protected members, then inheretance. If it can rely only on publics, then delegate.
Other choices would be events, and extension methods as well.

Maintaining modularity in Main()?

I'm writing the simple card game "War" for homework and now that the game works, I'm trying to make it more modular and organized. Below is a section of Main() containing the bulk of the program. I should mention, the course is being taught in C#, but it is not a C# course. Rather, we're learning basic logic and OOP concepts so I may not be taking advantage of some C# features.
bool sameCard = true;
while (sameCard)
{
sameCard = false;
card1.setVal(random.Next(1,14)); // set card value
val1 = determineFace(card1.getVal()); // assign 'face' cards accordingly
suit = suitArr[random.Next(0,4)]; // choose suit string from array
card1.setSuit(suit); // set card suit
card2.setVal(random.Next(1,14)); // rinse, repeat for card2...
val2 = determineFace(card2.getVal());
suit = suitArr[random.Next(0,4)];
card2.setSuit(suit);
// check if same card is drawn twice:
catchDuplicate(ref card1, ref card2, ref sameCard);
}
Console.WriteLine ("Player: {0} of {1}", val1, card1.getSuit());
Console.WriteLine ("Computer: {0} of {1}", val2, card2.getSuit());
// compare card values, display winner:
determineWinner(card1, card2);
So here are my questions:
Can I use loops in Main() and still consider it modular?
Is the card-drawing process written well/contained properly?
Is it considered bad practice to print messages in a method (i.e.: determineWinner())?
I've only been programming for two semesters and I'd like to form good habits at this stage. Any input/advice would be much appreciated.
Edit:
catchDuplicate() is now a boolean method and the call looks like this:
sameCard = catchDuplicate(card1, card2);
thanks to #Douglas.

Can I use loops in Main() and still consider it modular?
Yes, you can. However, more often than not, Main in OOP-programs contains only a handful of method-calls that initiate the core functionality, which is then stored in other classes.
Is the card-drawing process written well/contained properly?
Partially. If I understand your code correctly (you only show Main), you undertake some actions that, when done in the wrong order or with the wrong values, may not end up well. Think of it this way: if you sell your class library (not the whole product, but only your classes), what would be the clearest way to use your library for an uninitiated user?
I.e., consider a class Deck that contains a deck of cards. On creation it creates all cards and shuffles it. Give it a method Shuffle to shuffle the deck when the user of your class needs to shuffle and add methods like DrawCard for handling dealing cards.
Further: you have methods that are not contained within a class of their own yet have functionality that would be better of in a class. I.e., determineFace is better suited to be a method on class Card (assuming card2 is of type Card).
Is it considered bad practice to print messages in a method (i.e.: determineWinner())?
Yes and no. If you only want messages to be visible during testing, use Debug.WriteLine. In a production build, these will be no-ops. However, when you write messages in a production version, make sure that this is clear from the name of the method. I.e., WriteWinnerToConsole or something.
It's more common to not do this because: what format would you print the information? What text should come with it? How do you handle localization? However, when you write a program, obviously it must contain methods that write stuff to the screen (or form, or web page). These are usually contained in specific classes for that purpose. Here, that could be the class CardGameX for instance.
General thoughts
Think about the principle "one method/function should have only one task and one task only and it should not have side effects (like calculating square and printing, then printing is the side effect).".
The principle for classes is, very high-level: a class contains methods that logically belong together and operate on the same set of properties/fields. An example of the opposite: Shuffle should not be a method in class Card. However, it would belong logically in the class Deck.

If the main problem of your homework is create a modular application, you must encapsulate all logic in specialized classes.
Each class must do only one job.
Function that play with the card must be in a card class.
Function that draw cards, should be another class.
I think it is the goal of your homework, good luck!

Take all advices on "best practices" with a grain of salt. Always think for yourself.
That said:
Can I use loops in Main() and still consider it modular?
The two concepts are independent. If your Main() only does high-level logic (i.e. calls other methods) then it does not matter if it does so in a loop, after all the algorithm requires a loop. (you wouldn't add a loop unnecessarily, no?)
As a rule of thumb, if possible/practical, make your program self-documenting. Make it "readable" so, if a new person (or even you, a few months from now) looks at it they can understand it at any level.
Is the card-drawing process written well/contained properly?
No. First of all, a card should never be selected twice. For a more "modular" approach I would have something like this:
while ( Deck.NumCards >= 2 )
{
Card card1 = Deck.GetACard();
Card card2 = Deck.GetACard();
PrintSomeStuffAboutACard( GetWinner( card1, card2 ) );
}
Is it considered bad practice to print messages in a method (ie: determineWinner())?
Is the purpose of determineWinner to print a message? If the answer is "No" then it is not a matter of "bad practice", you function is plain wrong.
That said, there is such a thing as a "debug" build and a "release" build. To aid you in debugging the application and figuring out what works and what doesn't it is a good idea to add logging messages.
Make sure they are relevant and that they are not executed in the "release" build.

Q: Can I use loops in Main() and still consider it modular?
A: Yes, you can use loops, that doesn't really have an impact on modularity.
Q: Is the card-drawing process written well/contained properly?
A: If you want to be more modular, turn DrawCard into a function/method. Maybe just write DrawCards instead of DrawCard, but then there's an optimization-versus-modularity question there.
Q: Is it considered bad practice to print messages in a method (ie: determineWinner())?
A: I wouldn't say printing messages in a method is bad practice, it just depends on context. Ideally, the game itself doesn't handle anything but game logic. The program can have some kind of game object and it can read state from the game object. This way, you could technically change the game from being text-based to being graphical. I mean, that's ideal for modularity, but it may not be practical given a deadline. You always have to decide when you have to sacrifice a best practice because there isn't enough time. Sadly, this is all too often a common occurrence.
Separate game logic from the presentation of it. With a simple game like this, it's an unnecessary dependency.

C# On extending a large class in favor of readability

I have a large abstract class that handles weapons in my game. Combat cycles through a list of basic functions:
OnBeforeSwing
OnSwing
OnHit || OnMiss
What I have in mind is moving all combat damage-related calculations to another folder that handles just that. Combat damage-related calculations.
I was wondering if it would be correct to do so by making the OnHit method an extension one, or what would be the best approach to accomplish this.
Also. Periodically there are portions of the OnHit code that are modified, the hit damage formula is large because it takes into account a lot of conditions like resistances, transformation spells, item bonuses, special properties and other, similar, game elements.
This ends with a 500 line OnHit function, which kind of horrifies me. Even with region directives it's pretty hard to go through it without getting lost in the maze or even distracting yourself.
If I were to extend weapons with this function instead of just having the OnHit function, I could try to separate the different portions of the attack into other functions.
Then again, maybe I could to that by calling something like CombatSystem.HandleWeaponHit from the OnHit in the weapon class, and not use extension methods. It might be more appropriate.
Basically my question is if leaving it like this is really the best solution, or if I could (should?) move this part of the code into an extension method or a separate helper class that handles the damage model, and whether I should try and split the function into smaller "task" functions to improve readability.

I'm going to go out on a limb and suggest that your engine may not be abstracted enough. Mind you, I'm suggesting this without knowing anything else about your system aside from what you've told me in the OP.
In similar systems that I've designed, there were Actions and Effects. These were base classes. Each specific action (a machine gun attack, a specific spell, and so on) was a class derived from Action. Actions had an list of one or more specific effects that could be applied to Targets. This was achieved using Dependency Injection.
The combat engine didn't do all the math itself. Essentially, it asked the Target to calculate its defense rating, then cycled through all the active Actions and asked them to determine if any of its Effects applied to the Target. If they applied, it asked the Action to apply its relevant Effects to the Target.
Thus, the combat engine is small, and each Effect is very small, and easy to maintain.
If your system is one huge monolithic structure, you might consider a similar architecture.

OnHit should be an event handler, for starters. Any object that is hit should raise a Hit event, and then you can have one or more event handlers associated with that event.
If you cannot split up your current OnHit function into multiple event handlers, you can split it up into a single event handler but refactor it into multiple smaller methods that each perform a specific test or a specific calculation. It will make your code much more readable and maintainable.

IMHO Mike Hofer gives the leads.
The real point is not whether it's a matter of an extension method or not. The real point is that speaking of a single (extension or regular) method is unconceivable for such a complicated bunch of calculations.
Before thinking about the best implementation, you obviously need to rethink the whole thing to identify the best possible dispatch of responsibilities on objects. Each piece of elemental calculation must be done by the object it applies to. Always keep in mind the GRASP design patterns, especially Information Expert, Low Coupling and High Cohesion.
In general, each method in your project should always be a few lines of code long, no more. For each piece of calculation, think of which are all the classes on which this calculation is applicable. Then make this calculation a method of the common base class of them.
If there is no common base class, create a new interface, and make all these classes implement this interface. The interface might have methods or not : it can be used as a simple marker to identify the mentioned classes and make them have something in common.
Then you can build an elemental extension method like in this fake example :
public interface IExploding { int ExplosionRadius { get; } }
public class Grenade : IExploding { public int ExplosionRadius { get { return 30; } } ... }
public class StinkBomb : IExploding { public int ExplosionRadius { get { return 10; } } ... }
public static class Extensions
{
public static int Damages(this IExploding explosingObject)
{
return explosingObject.ExplosionRadius*100;
}
}
This sample is totally cheesy but simply aims to give leads to re-engineer your system in a more abstracted and maintenable way.
Hope this will help you !

Problems with adding a `lazy` keyword to C#

I would love to write code like this:
class Zebra
{
public lazy int StripeCount
{
get { return ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce(); }
}
}
EDIT: Why? I think it looks better than:
class Zebra
{
private Lazy<int> _StripeCount;
public Zebra()
{
this._StripeCount = new Lazy(() => ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce());
}
public lazy int StripeCount
{
get { return this._StripeCount.Value; }
}
}
The first time you call the property, it would run the code in the get block, and afterward would just return the value from it.
My questions:
What costs would be involved with adding this kind of keyword to the library?
What situations would this be problematic in?
Would you find this useful?
I'm not starting a crusade to get this into the next version of the library, but I am curious what kind of considerations a feature such as this should have to go through.

I am curious what kind of considerations a feature such as this should have to go through.
First off, I write a blog about this subject, amongst others. See my old blog:
http://blogs.msdn.com/b/ericlippert/
and my new blog:
http://ericlippert.com
for many articles on various aspects of language design.
Second, the C# design process is now open for view to the public, so you can see for yourself what the language design team considers when vetting new feature suggestions. See https://github.com/dotnet/roslyn/ for details.
What costs would be involved with adding this kind of keyword to the library?
It depends on a lot of things. There are, of course, no cheap, easy features. There are only less expensive, less difficult features. In general, the costs are those involving designing, specifying, implementing, testing, documenting and maintaining the feature. There are more exotic costs as well, like the opportunity cost of not doing a better feature, or the cost of choosing a feature that interacts poorly with future features we might want to add.
In this case the feature would probably be simply making the "lazy" keyword a syntactic sugar for using Lazy<T>. That's a pretty straightforward feature, not requiring a lot of fancy syntactic or semantic analysis.
What situations would this be problematic in?
I can think of a number of factors that would cause me to push back on the feature.
First off, it is not necessary; it's merely a convenient sugar. It doesn't really add new power to the language. The benefits don't seem to be worth the costs.
Second, and more importantly, it enshrines a particular kind of laziness into the language. There is more than one kind of laziness, and we might choose wrong.
How is there more than one kind of laziness? Well, think about how it would be implemented. Properties are already "lazy" in that their values are not calculated until the property is called, but you want more than that; you want a property that is called once, and then the value is cached for the next time. By "lazy" essentially you mean a memoized property. What guarantees do we need to put in place? There are many possibilities:
Possibility #1: Not threadsafe at all. If you call the property for the "first" time on two different threads, anything can happen. If you want to avoid race conditions, you have to add synchronization yourself.
Possibility #2: Threadsafe, such that two calls to the property on two different threads both call the initialization function, and then race to see who fills in the actual value in the cache. Presumably the function will return the same value on both threads, so the extra cost here is merely in the wasted extra call. But the cache is threadsafe, and doesn't block any thread. (Because the threadsafe cache can be written with low-lock or no-lock code.)
Code to implement thread safety comes at a cost, even if it is low-lock code. Is that cost acceptable? Most people write what are effectively single-threaded programs; does it seem right to add the overhead of thread safety to every single lazy property call whether it's needed or not?
Possibility #3: Threadsafe such that there is a strong guarantee that the initialization function will only be called once; there is no race on the cache. The user might have an implicit expectation that the initialization function is only called once; it might be very expensive and two calls on two different threads might be unacceptable. Implementing this kind of laziness requires full-on synchronization where it is possible that one thread blocks indefinitely while the lazy method is running on another thread. It also means there could be deadlocks if there's a lock-ordering problem with the lazy method.
That adds even more cost to the feature, a cost that is borne equally by people who do not take advantage of it (because they are writing single-threaded programs).
So how do we deal with this? We could add three features: "lazy not threadsafe", "lazy threadsafe with races" and "lazy threadsafe with blocking and maybe deadlocks". And now the feature just got a whole lot more expensive and way harder to document. This produces an enormous user education problem. Every time you give a developer a choice like this, you present them with an opportunity to write terrible bugs.
Third, the feature seems weak as stated. Why should laziness be applied merely to properties? It seems like this could be applied generally through the type system:
lazy int x = M(); // doesn't call M()
lazy int y = x + x; // doesn't add x + x
int z = y * y; // now M() is called once and cached.
// x + x is computed and cached
// y * y is computed
We try to not do small, weak features if there is a more general feature that is a natural extension of it. But now we're talking about really serious design and implementation costs.
Would you find this useful?
Personally? Not really useful. I write lots of simple low-lock lazy code mostly using Interlocked.Exchange. (I don't care if the lazy method gets run twice and one of the results discarded; my lazy methods are never that expensive.) The pattern is straightforward, I know it to be safe, there are never extra objects allocated for the delegate or the locks, and if I have something a little more complex I can always use Lazy<T> to do the work for me. It would be a small convenience.

The system library already has a class that does what you want: System.Lazy<T>
I'm sure it could be integrated into the language, but as Eric Lippert will tell you adding features to a language is not something to take lightly. Many things have to be considered, and the benefit/cost ratio needs to be very good. Since System.Lazy already handles this pretty well, I doubt we will see this anytime soon.

Do you know about the Lazy<T> class that was added in .Net 4.0?
http://sankarsan.wordpress.com/2009/10/04/laziness-in-c-4-0-lazyt/

Have you tryed / Dou you mean this?
private Lazy<int> MyExpensiveCountingValue = new Lazy<int>(new Func<int>(()=> ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce()));
public int StripeCount
{
get
{
return MyExpensiveCountingValue.Value;
}
}
EDIT:
after your post edit I would add that your idea is definitely more elegant, but still has the same functionallity!!!.

This is unlikely to be added to the C# language because you can easily do it yourself, even without Lazy<T>.
A simple, but not thread-safe, example:
class Zebra
{
private int? stripeCount;
public int StripeCount
{
get
{
if (this.stripeCount == null)
{
this.stripeCount = ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce();
}
return this.stripeCount;
}
}
}

If you don't mind using a post-compiler, CciSharp has this feature:
class Zebra {
[Lazy] public int StripeCount {
get { return ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce(); }
}
}

Have a look at the Lazy<T> type. Also ask Eric Lippert about adding things like this to the language, he would no doubt have a view.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.