Problems with adding a `lazy` keyword to C# - c#

I would love to write code like this:
class Zebra
{
public lazy int StripeCount
{
get { return ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce(); }
}
}
EDIT: Why? I think it looks better than:
class Zebra
{
private Lazy<int> _StripeCount;
public Zebra()
{
this._StripeCount = new Lazy(() => ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce());
}
public lazy int StripeCount
{
get { return this._StripeCount.Value; }
}
}
The first time you call the property, it would run the code in the get block, and afterward would just return the value from it.
My questions:
What costs would be involved with adding this kind of keyword to the library?
What situations would this be problematic in?
Would you find this useful?
I'm not starting a crusade to get this into the next version of the library, but I am curious what kind of considerations a feature such as this should have to go through.

I am curious what kind of considerations a feature such as this should have to go through.
First off, I write a blog about this subject, amongst others. See my old blog:
http://blogs.msdn.com/b/ericlippert/
and my new blog:
http://ericlippert.com
for many articles on various aspects of language design.
Second, the C# design process is now open for view to the public, so you can see for yourself what the language design team considers when vetting new feature suggestions. See https://github.com/dotnet/roslyn/ for details.
What costs would be involved with adding this kind of keyword to the library?
It depends on a lot of things. There are, of course, no cheap, easy features. There are only less expensive, less difficult features. In general, the costs are those involving designing, specifying, implementing, testing, documenting and maintaining the feature. There are more exotic costs as well, like the opportunity cost of not doing a better feature, or the cost of choosing a feature that interacts poorly with future features we might want to add.
In this case the feature would probably be simply making the "lazy" keyword a syntactic sugar for using Lazy<T>. That's a pretty straightforward feature, not requiring a lot of fancy syntactic or semantic analysis.
What situations would this be problematic in?
I can think of a number of factors that would cause me to push back on the feature.
First off, it is not necessary; it's merely a convenient sugar. It doesn't really add new power to the language. The benefits don't seem to be worth the costs.
Second, and more importantly, it enshrines a particular kind of laziness into the language. There is more than one kind of laziness, and we might choose wrong.
How is there more than one kind of laziness? Well, think about how it would be implemented. Properties are already "lazy" in that their values are not calculated until the property is called, but you want more than that; you want a property that is called once, and then the value is cached for the next time. By "lazy" essentially you mean a memoized property. What guarantees do we need to put in place? There are many possibilities:
Possibility #1: Not threadsafe at all. If you call the property for the "first" time on two different threads, anything can happen. If you want to avoid race conditions, you have to add synchronization yourself.
Possibility #2: Threadsafe, such that two calls to the property on two different threads both call the initialization function, and then race to see who fills in the actual value in the cache. Presumably the function will return the same value on both threads, so the extra cost here is merely in the wasted extra call. But the cache is threadsafe, and doesn't block any thread. (Because the threadsafe cache can be written with low-lock or no-lock code.)
Code to implement thread safety comes at a cost, even if it is low-lock code. Is that cost acceptable? Most people write what are effectively single-threaded programs; does it seem right to add the overhead of thread safety to every single lazy property call whether it's needed or not?
Possibility #3: Threadsafe such that there is a strong guarantee that the initialization function will only be called once; there is no race on the cache. The user might have an implicit expectation that the initialization function is only called once; it might be very expensive and two calls on two different threads might be unacceptable. Implementing this kind of laziness requires full-on synchronization where it is possible that one thread blocks indefinitely while the lazy method is running on another thread. It also means there could be deadlocks if there's a lock-ordering problem with the lazy method.
That adds even more cost to the feature, a cost that is borne equally by people who do not take advantage of it (because they are writing single-threaded programs).
So how do we deal with this? We could add three features: "lazy not threadsafe", "lazy threadsafe with races" and "lazy threadsafe with blocking and maybe deadlocks". And now the feature just got a whole lot more expensive and way harder to document. This produces an enormous user education problem. Every time you give a developer a choice like this, you present them with an opportunity to write terrible bugs.
Third, the feature seems weak as stated. Why should laziness be applied merely to properties? It seems like this could be applied generally through the type system:
lazy int x = M(); // doesn't call M()
lazy int y = x + x; // doesn't add x + x
int z = y * y; // now M() is called once and cached.
// x + x is computed and cached
// y * y is computed
We try to not do small, weak features if there is a more general feature that is a natural extension of it. But now we're talking about really serious design and implementation costs.
Would you find this useful?
Personally? Not really useful. I write lots of simple low-lock lazy code mostly using Interlocked.Exchange. (I don't care if the lazy method gets run twice and one of the results discarded; my lazy methods are never that expensive.) The pattern is straightforward, I know it to be safe, there are never extra objects allocated for the delegate or the locks, and if I have something a little more complex I can always use Lazy<T> to do the work for me. It would be a small convenience.

The system library already has a class that does what you want: System.Lazy<T>
I'm sure it could be integrated into the language, but as Eric Lippert will tell you adding features to a language is not something to take lightly. Many things have to be considered, and the benefit/cost ratio needs to be very good. Since System.Lazy already handles this pretty well, I doubt we will see this anytime soon.

Do you know about the Lazy<T> class that was added in .Net 4.0?
http://sankarsan.wordpress.com/2009/10/04/laziness-in-c-4-0-lazyt/

Have you tryed / Dou you mean this?
private Lazy<int> MyExpensiveCountingValue = new Lazy<int>(new Func<int>(()=> ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce()));
public int StripeCount
{
get
{
return MyExpensiveCountingValue.Value;
}
}
EDIT:
after your post edit I would add that your idea is definitely more elegant, but still has the same functionallity!!!.

This is unlikely to be added to the C# language because you can easily do it yourself, even without Lazy<T>.
A simple, but not thread-safe, example:
class Zebra
{
private int? stripeCount;
public int StripeCount
{
get
{
if (this.stripeCount == null)
{
this.stripeCount = ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce();
}
return this.stripeCount;
}
}
}

If you don't mind using a post-compiler, CciSharp has this feature:
class Zebra {
[Lazy] public int StripeCount {
get { return ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce(); }
}
}

Have a look at the Lazy<T> type. Also ask Eric Lippert about adding things like this to the language, he would no doubt have a view.

Related

C# Efficiency for method parameters

Am I correct in saying that this:
public static void MethodName{bool first, bool second, bool third}
{
//Do something
}
Is more efficient than this:
public static void MethodName{bool [] boolArray}
{
bool first = boolArray[0];
bool second = boolArray[1];
bool third = boolArray[2];
//Do something
}
My thoughts are that for both they would have to declare first, second and third - just in different places. But for the second one it has to add it into an array and then unpack it again.
Unless you declared the array like this:
MethodName(new[] { true, true, true });
In which case I am not sure which is faster?
I ask because I am thinking of using the second one but wanted to know if/what the implications are on performance.
In this case performance is not particularly important, but it would be helpful for me to clarify this point.
Also, the second one has the advantage that you can pass as many values as you like to it, and it is also easier to read I think?
The reason I am thinking of using this is because there are already about 30 parameters being passed into the method and I feel it is becoming confusing to keep adding more. All these bools are closely related so I thought it may make the code more manageable to package them up.
I am working on existing code and it is not in my project scope to spend time reworking the method to decrease the number of parameters that are passed into the method, but I thought it would be good practice to understand the implications of this change.
In terms of performance, there's just an answer for your question:
"Programmers waste enormous amounts of time thinking about, or
worrying about, the speed of noncritical parts of their programs, and
these attempts at efficiency actually have a strong negative impact
when debugging and maintenance are considered. We should forget about
small efficiencies, say about 97% of the time: premature optimization
is the root of all evil. Yet we should not pass up our opportunities
in that critical 3%."
In terms of productivity, parameters > arrays.
Side note
Everyone should know that that was said by Donald Knuth in 1974. More than 40 years after this statement, we still fall on premature optimization (or even pointless optimization) very often!
Further reading
I would take a look at this other Q&A on Software Engineering
Am I correct in saying that this:
Is more efficient than this:
In isolation, yes. Unless the caller already has that array, in which case the second is the same or even (for larger argument types or more arguments) minutely faster.
I ask because I am thinking of using the second one but wanted to know if/what the implications are on performance.
Why are you thinking about the second one? If it is more natural at the point of the call then the reasons making it more natural are likely going to also have a performance impact that makes the second the better one in the wider context that outweighs this.
If you're starting off with three separate bools and you're wrapping them just to unwrap them again then I don't see what this offers in practice except for more typing.
So your reason for considering this at all is the more important thing here.
In this case performance is not particularly important
Then really don't worry about it. It's certainly known for hot-path code that hits params to offer overloads that take set numbers of individual parameters, but it really does only make a difference in hot paths. If you aren't in a hot path the lifetime saving of computing time of picking whichever of the two is indeed more efficient is unlikely to add up to the
amount of time it took you to write your post here.
If you are in a hot path and really need to shave off every nanosecond you can because you're looping so much that it will add up to something real, then you have to measure. Isolated changes have non-isolated effects when it comes to performance, so it doesn't matter whether the people on the Internet tell you A is faster than B if the wider context means the code calling A is slower than B. Measure. Measurement number one is "can I even notice?", if the answer to that measurement is "no" then leave it alone and find somewhere where the performance impact is noticeable to optimise instead.
Write "natural" code to start with, before seeing if little tweaks can have a performance impact in the bits that are actually hurting you. This isn't just because of the importance of readability and so on, but also because:
The more "natural" code in a given language very often is the more efficient. Even if you think it can't be, it's more likely to benefit from some compiler optimisation behind the scenes.
The more "natural" code is a lot easier to tweak for performance when it is necessary than code doing a bunch of strange things.
I don't think this would affect the performance of your app at all.
Personally
I'd go with the first option for two reasons:
Naming each parameter: if the project is a large scale project and there is a lot of coding or for possible future edits and enhancements.
Usability: if you are sending a list of similar parameters then you must use an array or a list, if it just a couple of parameters that happened to be of the same type then you should be sending them separately.
Third way would be use of params, Params - MSDN
In the end I dont think it will change much in performance.
array[] though inheritates from abstract Array class which implements IEnumerable and IEnumerable<t> (ICloneable, IList, ICollection,
IEnumerable, IStructuralComparable, IStructuralEquatable), this means objects are more blown up than three value type Parameters, which will make then slower obviously
Array - MSDN
You could test performance differences on both, but I doubt there would be much difference.
You have to consider maintainability, is another programmer, or even yourself going to understand why you did it that way in a few weeks, or a few months time when it's time for review? Is it easily extended, can you pass different object types through to your method?
If your passing a collection of items, then certainly packing them into an array would be quicker than specifying a new parameter for each additional item?
If you have to, you can do it that way, but have you considered param array??
Why use the params keyword?
public static void MethodName{params bool [] boolAarray}
{
//extract data here
}
Agreed with Matias' answer.
I also want to add that you need to add error checking, as you are passed an array, and nowhere is stated how many elements in your array you will receive. So you must first check that you have three elements in your array. This will balance the small perf gain that you may have earned.
Also, if you ever want to make this method available to other developers (as part of an API, public or private), intellisense will not help them at all in which parameters they're suppposed to set...
While using three parameters, you can do this :
///<summary>
///This method does something
///</summary>
///<param name="first">The first parameter</param>
///<param name="second">The second parameter</param>
///<param name="third">The third parameter</param>
public static void MethodName{bool first, bool second, bool third}
{
//Do something
}
And it will be displayed nicely and helpfully to others...
I would take a different approach and use Flags;
public static void MethodName(int Flag)
{
if (Flag & FIRST) { }
}
Chances are the compiler will do its own optimizations;
Check http://rextester.com/QRFL3116 Added method from Jamiec comment
M1 took 5ms
M2 took 23ms
M3 took 4ms

Isn't there a point where encapsulation gets ridiculous?

For my software development programming class we were supposed to make a "Feed Manager" type program for RSS feeds. Here is how I handled the implementation of FeedItems.
Nice and simple:
struct FeedItem {
string title;
string description;
string url;
}
I got marked down for that, the "correct" example answer is as follows:
class FeedItem
{
public:
FeedItem(string title, string description, string url);
inline string getTitle() const { return this->title; }
inline string getDescription() const { return this->description; }
inline string getURL() const { return this->url; }
inline void setTitle(string title) { this->title = title; }
inline void setDescription(string description){ this->description = description; }
inline void setURL(string url) { this->url = url; }
private:
string title;
string description;
string url;
};
Now to me, this seems stupid. I honestly can't believe I got marked down, when this does the exact same thing that mine does with a lot more overhead.
It reminds me of how in C# people always do this:
public class Example
{
private int _myint;
public int MyInt
{
get
{
return this._myint;
}
set
{
this._myint = value;
}
}
}
I mean I GET why they do it, maybe later on they want to validate the data in the setter or increment it in the getter. But why don't you people just do THIS UNTIL that situation arises?
public class Example
{
public int MyInt;
}
Sorry this is kind of a rant and not really a question, but the redundancy is maddening to me. Why are getters and setters so loved, when they are unneeded?
It's an issue of "best practice" and style.
You don't ever want to expose your data members directly. You always want to be able to control how they are accessed. I agree, in this instance, it seems a bit ridiculous, but it is intended to teach you that style so you get used to it.
It helps to define a consistent interface for classes. You always know how to get to something --> calling its get method.
Then there's also the reusability issue. Say, down the road, you need to change what happens when somebody accesses a data member. You can do that without forcing clients to recompile code. You can simply change the method in the class and guarantee that the new logic is utilized.
Here's a nice long SO discussion on the subject: Why use getters and setters.
The question you want to ask yourself is "What's going to happen 3 months from now when you realize that FeedItem.url does need to be validated but it's already referenced directly from 287 other classes?"
The main reason to do this before its needed is for versioning.
Fields behave differently than properties, especially when using them as an lvalue (where it's often not allowed, especially in C#). Also, if you need to, later, add property get/set routines, you'll break your API - users of your class will need to rewrite their code to use the new version.
It's much safer to do this up front.
C# 3, btw, makes this easier:
public class Example
{
public int MyInt { get; set; }
}
I absolutely agree with you. But in life you should probably do The Right Thing: in school, it's to get good marks. In your workplace it's to fulfill specs. If you want to be stubborn, then that's fine, but do explain yourself -- cover your bases in comments to minimize the damage you might get.
In your particular example above I can see you might want to validate, say, the URL. Maybe you'd even want to sanitize the title and the description, but either way I think this is the sort of thing you can tell early on in the class design. State your intentions and your rationale in comments. If you don't need validation then you don't need a getter and setter, you're absolutely right.
Simplicity pays, it's a valuable feature. Never do anything religiously.
If something's a simple struct, then yes it's ridiculous because it's just DATA.
This is really just a throwback to the beginning of OOP where people still didn't get the idea of classes at all. There's no reason to have hundreds of get and set methods just in case you might change getId() to be an remote call to the hubble telescope some day.
You really want that functionality at the TOP level, at the bottom it's worthless. IE you would have a complex method that was sent a pure virtual class to work on, guaranteeing it can still work no matter what happens below. Just placing it randomly in every struct is a joke, and it should never be done for a POD.
Maybe both options are a bit wrong, because neither version of the class has any behaviour. It's hard to comment further without more context.
See http://www.pragprog.com/articles/tell-dont-ask
Now lets imagine that your FeedItem class has become wonderfully popular and is being used by projects all over the place. You decide you need (as other answers have suggested) validate the URL that has been provided.
Happy days, you have written a setter for the URL. You edit this, validate the URL and throw an exception if it is invalid. You release your new version of the class and everyone one using it is happy. (Let's ignored checked vs unchecked exceptions to keep this on-track).
Except, then you get a call from an angry developer. They were reading a list of feeditems from a file when their application starts up. And now, if someone makes a little mistake in the configuration file your new exception is thrown and the whole system doesn't start up, just because one frigging feed item was wrong!
You may have kept the method signature the same, but you have changed the semantics of the interface and so it breaks dependant code. Now, you can either take the high-ground and tell them to re-write their program right or you humbly add setURLAndValidate.
Keep in mind that coding "best practices" are often made obsolete by advances in programming languages.
For example, in C# the getter/setter concept has been baked into the language in the form of properties. C# 3.0 made this easier with the introduction of automatic properties, where the compiler automatically generates the getter/setter for you. C# 3.0 also introduced object initializers, which means that in most cases you no longer need to declare constructors which simply initialize properties.
So the canonical C# way to do what you're doing would look like this:
class FeedItem
{
public string Title { get; set; } // automatic properties
public string Description { get; set; }
public string Url { get; set; }
};
And the usage would look like this (using object initializer):
FeedItem fi = new FeedItem() { Title = "Some Title", Description = "Some Description", Url = "Some Url" };
The point is that you should try and learn what the best practice or canonical way of doing things are for the particular language you are using, and not simply copy old habits which no longer make sense.
As a C++ developer I make my members always private simply to be consistent. So I always know that I need to type p.x(), and not p.x.
Also, I usually avoid implementing setter methods. Instead of changing an object I create a new one:
p = Point(p.x(), p.y() + 1);
This preserves encapsulation as well.
There absolutely is a point where encapsulation becomes ridiculous.
The more abstraction that is introduced into code the greater your up-front education, learning-curve cost will be.
Everyone who knows C can debug a horribly written 1000 line function that uses just the basic language C standard library. Not everyone can debug the framework you've invented. Every introduced level encapsulation/abstraction must be weighed against the cost. That's not to say its not worth it, but as always you have to find the optimal balance for your situation.
One of the problems that the software industry faces is the problem of reusable code. Its a big problem. In the hardware world, hardware components are designed once, then the design is reused later when you buy the components and put them together to make new things.
In the software world every time we need a component we design it again and again. Its very wasteful.
Encapsulation was proposed as a technique for ensuring that modules that are created are reusable. That is, there is a clearly defined interface that abstracts the details of the module and make it easier to use that module later. The interface also prevents misuse of the object.
The simple classes that you build in class do not adequately illustrate the need for the well defined interface. Saying "But why don't you people just do THIS UNTIL that situation arises?" will not work in real life. What you are learning in you software engineering course is to engineer software that other programmers will be able to use. Consider that the creators of libraries such as provided by the .net framework and the Java API absolutely require this discipline. If they decided that encapsulation was too much trouble these environments would be almost impossible to work with.
Following these guidelines will result in high quality code in the future. Code that adds value to the field because more than just yourself will benefit from it.
One last point, encapsulation also makes it possible to adequately test a module and be resonably sure that it works. Without encapsulation, testing and verification of your code would be that much more difficult.
Getters/Setters are, of course, good practice but they are tedious to write and, even worse, to read.
How many times have we read a class with half a dozen member variables and accompanying getters/setters, each with the full hog #param/#return HTML encoded, famously useless comment like 'get the value of X', 'set the value of X', 'get the value of Y', 'set the value of Y', 'get the value of Z', 'set the value of Zzzzzzzzzzzzz. thump!
This is a very common question: "But why don't you people just do THIS UNTIL that situation arises?".
The reason is simple: usually it is much cheaper not to fix/retest/redeploy it later, but to do it right the first time.
Old estimates say that maintenance costs are 80%, and much of that maintenance is exactly what you are suggesting: doing the right thing only after someone had a problem. Doing it right the first time allows us to concentrate on more interesting things and to be more productive.
Sloppy coding is usually very unprofitable - your customers are unhappy because the product is unreliable and they are not productive when the are using it. Developers are not happy either - they spend 80% of time doing patches, which is boring. Eventually you can end up losing both customers and good developers.
I agree with you, but it's important to survive the system. While in school, pretend to agree. In other words, being marked down is detrimental to you and it is not worth it to be marked down for your principles, opinions, or values.
Also, while working on a team or at an employer, pretend to agree. Later, start your own business and do it your way. While you try the ways of others, be calmly open-minded toward them -- you may find that these experiences re-shape your views.
Encapsulation is theoretically useful in case the internal implementation ever changes. For example, if the per-object URL became a calculated result rather than a stored value, then the getUrl() encapsulation would continue to work. But I suspect you already have heard this side of it.

Thread Safe Class Library Design

I'm working on a class library and have opted for a route with my design to make implementation and thread safety slightly easier, however I'm wondering if there might be a better approach.
A brief background is that I have a multi-threaded heuristic algorithm within a class library, that once set-up with a scenario should attempt to solve it. However I obviously want it to be thread safe and if someone makes a change to anything while it is solving for that to causes crashes or errors.
The current approach I've got is if I have a class A, then I create a number InternalA instances for each A instance. The InternalA has many of the important properties from the A class, but is internal an inaccessible outside the library.
The downside of this, is that if I wish to extend the decision making logic (or actually let someone do this outside the library) then it means I need to change the code within the InternalA (or provide some sort of delegate function).
Does this sound like the right approach?
It's hard to really say from just that - but I can say that if you can make everything immutable, your life will be a lot easier. Look at how functional languages approach immutable data structures and collections. The less shared mutable data you have, the simple threading will be.
Why Not?
Create generic class, that accepts 2 members class (eg. Lock/Unlock) - so you could provide
Threadsafe impl (implmenetation can use Monitor.Enter/Exit inside)
System-wide safe impl (using Mutex)
Unsafe, but fast (using empty impl).
another way i have had some success with is by using interfaces to achieve functional separation. the cost of this approach is that you end up with some fields 'repeated' because each interface requires total separation from the others fields.
In my case I had 2 threads that need to pass over a set of data that potentially is large and needs as little garbage collection as possible. Ie I only want to pass change information from the first stage to the second. And then have the first process the next work unit.
this was achieved by the use of change buffers to pass changes from one interface to the next.
this allows one thread to work away at one interface, make all its changes and then publish a struct containing the changes that the other interface (thread) needs to apply prior to its work.
by doing this You have a double buffer ... (thread 1 produces a change report whilst thread 2 consumes the last report). If you add more interfaces (and threads) it appears like there are pulses of work moving through the threads.
This was based on my research and I have no doubt that there are better methods available now.
My aim when coming up with this however was to avoid the need for locks in the vast majority of code by designing out race conditions. the other major consideration is performance in garbage collection - which may not be an issue for you.
this way is all good until you need complex interactions between threads ... then you find that you start forcing the layout of your buffer structures for reuse to get around inheritance which in turn has an upkeep overhead.
A little more information on the problem to help...
The heuristic I'm using is to solve TSP like problems. What happens right at the start of each
calculation is that all the aspects that form the problem (sales man/places to visit) are cloned
so they aren't affected across threads.
This means each thread can change data (such as stock left on a sales man etc) as there are a number
of values that change during the calculation as things progress. What I'd quite like to do is allow
the checked such as HasSufficientStock() for a simple example to be override by a developer using the library.
Unforutantely at present however to add further protection across threads and makings some simplier/lightweight
classes I convert them to these internal classes, and these are the things that are actually used and cloned.
For example
class A
{
public double Stock { get; }
// Processing and cloning actually works using these InternalA's
internal InternalA ConvertToInternal() {}
}
internal class InternalA : ICloneable
{
public double Stock { get; set; }
public bool HasSufficientStock() {}
}

Is it a good idea to create a custom type for the primary key of each data table?

We have a lot of code that passes about “Ids” of data rows; these are mostly ints or guids. I could make this code safer by creating a different struct for the id of each database table. Then the type checker will help to find cases when the wrong ID is passed.
E.g the Person table has a column calls PersonId and we have code like:
DeletePerson(int personId)
DeleteCar(int carId)
Would it be better to have:
struct PersonId
{
private int id;
// GetHashCode etc....
}
DeletePerson(PersionId persionId)
DeleteCar(CarId carId)
Has anyone got real life experience
of dong this?
Is it worth the overhead?
Or more pain then it is worth?
(It would also make it easier to change the data type in the database of the primary key, that is way I thought of this ideal in the first place)
Please don’t say use an ORM some other big change to the system design as I know an ORM would be a better option, but that is not under my power at present. However I can make minor changes like the above to the module I am working on at present.
Update:
Note this is not a web application and the Ids are kept in memory and passed about with WCF, so there is no conversion to/from strings at the edge. There is no reason that the WCF interface can’t use the PersonId type etc. The PersonsId type etc could even be used in the WPF/Winforms UI code.
The only inherently "untyped" bit of the system is the database.
This seems to be down to the cost/benefit of spending time writing code that the compiler can check better, or spending the time writing more unit tests. I am coming down more on the side of spending the time on testing, as I would like to see at least some unit tests in the code base.
It's hard to see how it could be worth it: I recommend doing it only as a last resort and only if people are actually mixing identifiers during development or reporting difficulty keeping them straight.
In web applications in particular it won't even offer the safety you're hoping for: typically you'll be converting strings into integers anyway. There are just too many cases where you'll find yourself writing silly code like this:
int personId;
if (Int32.TryParse(Request["personId"], out personId)) {
this.person = this.PersonRepository.Get(new PersonId(personId));
}
Dealing with complex state in memory certainly improves the case for strongly-typed IDs, but I think Arthur's idea is even better: to avoid confusion, demand an entity instance instead of an identifier. In some situations, performance and memory considerations could make that impractical, but even those should be rare enough that code review would be just as effective without the negative side-effects (quite the reverse!).
I've worked on a system that did this, and it didn't really provide any value. We didn't have ambiguities like the ones you're describing, and in terms of future-proofing, it made it slightly harder to implement new features without any payoff. (No ID's data type changed in two years, at any rate - it's could certainly happen at some point, but as far as I know, the return on investment for that is currently negative.)
I wouldn't make a special id for this. This is mostly a testing issue. You can test the code and make sure it does what it is supposed to.
You can create a standard way of doing things in your system than help future maintenance (similar to what you mention) by passing in the whole object to be manipulated. Of course, if you named your parameter (int personID) and had documentation then any non malicious programmer should be able to use the code effectively when calling that method. Passing a whole object will do that type matching that you are looking for and that should be enough of a standardized way.
I just see having a special structure made to guard against this as adding more work for little benefit. Even if you did this, someone could come along and find a convenient way to make a 'helper' method and bypass whatever structure you put in place anyway so it really isn't a guarantee.
You can just opt for GUIDs, like you suggested yourself. Then, you won't have to worry about passing a person ID of "42" to DeleteCar() and accidentally delete the car with ID of 42. GUIDs are unique; if you pass a person GUID to DeleteCar in your code because of a programming typo, that GUID will not be a PK of any car in the database.
You could create a simple Id class which can help differentiate in code between the two:
public class Id<T>
{
private int RawValue
{
get;
set;
}
public Id(int value)
{
this.RawValue = value;
}
public static explicit operator int (Id<T> id) { return id.RawValue; }
// this cast is optional and can be excluded for further strictness
public static implicit operator Id<T> (int value) { return new Id(value); }
}
Used like so:
class SomeClass
{
public Id<Person> PersonId { get; set; }
public Id<Car> CarId { get; set; }
}
Assuming your values would only be retrieved from the database, unless you explicitly cast the value to an integer, it is not possible to use the two in each other's place.
I don't see much value in custom checking in this case. You might want to beef up your testing suite to check that two things are happening:
Your data access code always works as you expect (i.e., you aren't loading inconsistent Key information into your classes and getting misuse because of that).
That your "round trip" code is working as expected (i.e., that loading a record, making a change and saving it back isn't somehow corrupting your business logic objects).
Having a data access (and business logic) layer you can trust is crucial to being able to address the bigger pictures problems you will encounter attempting to implement the actual business requirements. If your data layer is unreliable you will be spending a lot of effort tracking (or worse, working around) problems at that level that surface when you put load on the subsystem.
If instead your data access code is robust in the face of incorrect usage (what your test suite should be proving to you) then you can relax a bit on the higher levels and trust they will throw exceptions (or however you are dealing with it) when abused.
The reason you hear people suggesting an ORM is that many of these issues are dealt with in a reliable way by such tools. If your implementation is far enough along that such a switch would be painful, just keep in mind that your low level data access layer needs to be as robust as an good ORM if you really want to be able to trust (and thus forget about to a certain extent) your data access.
Instead of custom validation, your testing suite could inject code (via dependency injection) that does robust tests of your Keys (hitting the database to verify each change) as the tests run and that injects production code that omits or restricts such tests for performance reasons. Your data layer will throw errors on failed keys (if you have your foreign keys set up correctly there) so you should also be able to handle those exceptions.
My gut says this just isn't worth the hassle. My first question to you would be whether you actually have found bugs where the wrong int was being passed (a Car ID instead of a Person ID in your example). If so, it is probably more of a case of worse overall architecture in that your Domain objects have too much coupling, and are passing too many arguments around in method parameters rather than acting on internal variables.

Is it true I should not do "long running" things in a property accessor?

And if so, why?
and what constitutes "long running"?
Doing magic in a property accessor seems like my prerogative as a class designer. I always thought that is why the designers of C# put those things in there - so I could do what I want.
Of course it's good practice to minimize surprises for users of a class, and so embedding truly long running things - eg, a 10-minute monte carlo analysis - in a method makes sense.
But suppose a prop accessor requires a db read. I already have the db connection open. Would db access code be "acceptable", within the normal expectations, in a property accessor?
Like you mentioned, it's a surprise for the user of the class. People are used to being able to do things like this with properties (contrived example follows:)
foreach (var item in bunchOfItems)
foreach (var slot in someCollection)
slot.Value = item.Value;
This looks very natural, but if item.Value actually is hitting the database every time you access it, it would be a minor disaster, and should be written in a fashion equivalent to this:
foreach (var item in bunchOfItems)
{
var temp = item.Value;
foreach (var slot in someCollection)
slot.Value = temp;
}
Please help steer people using your code away from hidden dangers like this, and put slow things in methods so people know that they're slow.
There are some exceptions, of course. Lazy-loading is fine as long as the lazy load isn't going to take some insanely long amount of time, and sometimes making things properties is really useful for reflection- and data-binding-related reasons, so maybe you'll want to bend this rule. But there's not much sense in violating the convention and violating people's expectations without some specific reason for doing so.
In addition to the good answers already posted, I'll add that the debugger automatically displays the values of properties when you inspect an instance of a class. Do you really want to be debugging your code and have database fetches happening in the debugger every time you inspect your class? Be nice to the future maintainers of your code and don't do that.
Also, this question is extensively discussed in the Framework Design Guidelines; consider picking up a copy.
A db read in a property accessor would be fine - thats actually the whole point of lazy-loading. I think the most important thing would be to document it well so that users of the class understand that there might be a performance hit when accessing that property.
You can do whatever you want, but you should keep the consumers of your API in mind. Accessors and mutators (getters and setters) are expected to be very light weight. With that expectation, developers consuming your API might make frequent and chatty calls to these properties. If you are consuming external resources in your implementation, there might be an unexpected bottleneck.
For consistency sake, it's good to stick with convention for public APIs. If your implementations will be exclusively private, then there's probably no harm (other than an inconsistent approach to solving problems privately versus publicly).
It is just a "good practice" not to make property accessors taking long time to execute.
That's because properties looks like fields for the caller and hence caller (a user of your API that is) usually assumes there is nothing more than just a "return smth;"
If you really need some "action" behind the scenes, consider creating a method for that...
I don't see what the problem is with that, as long as you provide XML documentation so that the Intellisense notifies the object's consumer of what they're getting themselves into.
I think this is one of those situations where there is no one right answer. My motto is "Saying always is almost always wrong." You should do what makes the most sense in any given situation without regard to broad generalizations.
A database access in a property getter is fine, but try to limit the amount of times the database is hit through caching the value.
There are many times that people use properties in loops without thinking about the performance, so you have to anticipate this use. Programmers don't always store the value of a property when they are going to use it many times.
Cache the value returned from the database in a private variable, if it is feasible for this piece of data. This way the accesses are usually very quick.
This isn't directly related to your question, but have you considered going with a load once approach in combination with a refresh parameter?
class Example
{
private bool userNameLoaded = false;
private string userName = "";
public string UserName(bool refresh)
{
userNameLoaded = !refresh;
return UserName();
}
public string UserName()
{
if (!userNameLoaded)
{
/*
userName=SomeDBMethod();
*/
userNameLoaded = true;
}
return userName;
}
}

Categories