How do I implement caching in an immutable way? - c#

I've read and heard a lot of good things about immutability, so I decided to try it out in one of my hobby projects. I declared all of my fields as readonly, and made all methods that would usually mutate an object to return a new, modified version.
It worked great until I ran into a situation where a method should, by external protocol, return a certain information about an object without modifying it, but at the same time could be optimized by modifying the internal structure. In particular, this happens with tree path compression in a union find algorithm.
When the user calls int find(int n), object appears unmodified to the outsider. It represents the same entity conceptually, but it's internal fields are mutated to optimize the running time.
How can I implement this in an immutable way?

Short answer: you have to ensure the thread-safety by yourself.
The readonly keyword on a field gives you the insurance that the field cannot be modified after the object containing this field has been constructed.
So the only write you can have for this field is contained in the constructor (or in the field initialization), and a read through a method call cannot occur before the object is constructed, hence the thread-safety of readonly.
If you want to implement caching, you break the assumption that only one write occurs (since "caching writes" can and will occur during you reads), and thus there can be threading problems in bad cases (think you're reading lines from a file, two threads can call the find method with the same parameter but read two different lines and therefore get different results).
What you want to implement is observational immutability. This related question about memoization may help you with an elegant answer.

Related

Why CancellationToken is a struct?

Does it make any sense to use a struct instead of a reference type in case of CancellationToken?
I see one possible disadvantage, it will be copied all the way down in methods chain as I pass it as a parameter.
In the same time, as far as it is struct, it might be allocated and disposed faster.
If we want to make it immutable, we can use readonly properties or private setters.
So what was an idea behind it?
There is an article that describes the .NET Cancellation design here which is worth a read. In relation to your question, the following is asked in the comments:
Just out of interest, why is the CancellationToken a value type?
and the questioner proposes an alternative implementation with a single shared instance of CancellationToken as a reference type.
And the response by Mike Liddell:
This would certainly work and is largely equivalent (and we implemented it this way during early prototyping), but we went with the current design for two particular reasons:
– only one class instance per CTS/Token, hence less GC pressure.
– we consolidate all of the state and most of the logic onto CTS. The split arrangement was a somewhat more convoluted.
I would note that the current value type implementation is exactly the same size as a reference so there isn't any additional copying overhead. It also prevents some additional boilerplate null checks in user-code, especially when making the token an optional parameter.
Does it make any sense to use a struct instead of a reference type in case of CancellationToken?
Yes.
I see one possible disadvantage, it will be copied all the way down in methods chain as I pass it as a parameter.
That is not a disadvantage. A cancellation token is reference-sized. Why would there be a disadvantage of passing a reference-sized struct vs passing a reference? This objection doesn't make sense. Please explain why you think this is a "disadvantage".
In the same time, as far as it is struct, it might be allocated and disposed faster.
That's correct, but the actual win is more likely that a reference-sized struct that wraps a reference does not increase collection pressure. Many of the design and implementation decisions in the .NET framework are designed to ensure that collection pressure is not increased too much by framework code.
So what was an idea behind it?
It's a small type that is logically a value; why shouldn't it be a value type?

Is it better to pass references down a chain or to use public static variables

Say we have a Game class.
The game class needs to pass down a reference to it's spritebatch. That is, the class calls a method passing it, and that method in turn passes it to other methods, until it is finally used.
Is this bad for performance? Is it better to just use statics?
I see one obvious disadvantage of statics though, being unable to make duplicate functionality in the same application.
It is not easy to answer your question as you have not specifically mentioned the requirement but generally i can give you some advice.
Always consider encapsulation: Do not expose the properties if they are not used else where.
Performance :For reference types, there is no any performance penalty, as they are already a reference type.but if your type is a value type then there will be a very small performance penalty.
So there is a Design or Performance trade off exists, Unless your method is called millions of times, you never have to think about public static property.
There are cons and pros like in everything.
Is this is a good or bad from performance point of view, depends on how computational intensive and how often used that code inside your game.
So here are my considerations on subject.
Passing like parameter:
Cons : passing more variable on stack, to push it into the function call.It's very fast, but again, it depends how the code you're talking about is used, so absence of it can bring some benefits, that's why inserted this point in cons.
Pros : you esplicitly manifest that the function on top of calling stack needs that parameter for read and/or write, so one looking on that code could easily imagine semantic dependencies of your calls.
Use like static:
Cons : There is no clear evidence (if not via direct knowledge or good written documentation) what parameters would or could affect the calculus inside that functions.
Pros : You don't pass it on the stack for all functions in chain.
I would personally recommend: use it like a parameter, because this clearly manifests what calling code depends on and even if there would be some measurable performance drawback, most probably it will not be relevant in your case. But again, as Rico Mariani always suggests: measure, measure, measure...
Statics is mostly not the best way. Because if later one you want to make multiple instances you might be in trouble.
Of course passing references cost a bit of performance, but depending on the amount of creation of instances it will matter more or less. Unless you are creating millions of objects every small amount of time it might be an issue.

Why doesn't the .NET framework provide a method to deep copy objects? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was using a custom method to deep clone objects the other day, and I know you can deep clone in different ways (reflection, binary serialization, etc), so I was just wondering:
What is/are the reason(s) that Microsoft does not include a deep copy method in the framework?
The problem is substantially harder than you seem to realize, at least in the general case.
For starters, a copy isn't just deep or shallow, it's a spectrum.
Let's imagine for a second that we have a list of arrays of strings, and we want to make a copy of it.
We start out at the shallowest level, we just copy the reference of the whole thing to another variable. Any changes to the list referenced from either variable is seen by the other.
So now we go and create a brand new list to give to the second variable. For each item in the first list we add it to the second list. Now we can modify the list referenced from either variable without it being seen by the other one. But what if we grab the first item of a list and change the string that's in the first array, it will be seen by both lists!
Now we're going through and creating a new list, and for each array in the first list we're creating a new array, adding each of the strings in the underlying array to the new array, and adding each of those new arrays to the new list. Now we can mutate any of the arrays in either list without seeing the changes. But wait, both lists are still referencing the same strings (which are value types after all; they internally have a character array for their data). What if some mean person were to come along and mutate one of the strings (using unsafe code you could actually do this)! So now you're copying all of the strings with a deep copy. But what if we don't need to do that? What if we know that nobody is so mean that they would mutate the string? Or, for that matter, what if we know that none of the arrays will be mutated (or that if they will be, that they're supposed to be reflected by both lists).
Then of course there are problems such as circular references, fields in a class that don't really represent it's state (i.e. possibly cached values or derived data that could just be re-calculated as-needed by a clone).
Realistically you'd need to have every type implement IClonable or some equivalent, and have it's own custom code for how to clone itself. This would be a lot of work to maintain for a language, especially since there are so many ways that complex objects could possibly be cloned. The cost would be quite high, and the benefits (outside of a handful of objects that it is deemed worthwhile to implement clone methods for) are generally not worth it. You, as a programmer, and write your own logic for cloning a type based on how deep you know you need to go.
It's similar to how it works (or doesn't work) in C and C++:
To do a deep copy, you actually have to know how different data is interpreted. In trivial cases, a shallow copy (which is provided) is the same as a deep copy. But once this is no longer true, it really depends on the implementation and interpretation. There's no general rule of thumb.
Let's use a game as a simple example:
A NPC object has two integers as members. One integer represents its health points, the other one is its unique ID.
If you clone the NPC, you have to keep the amount of health, while changing the unique ID. This is something the compiler/runtime can't determine on their own. You have to code this, essentially telling the program "how to copy".
I can think of two possible solutions:
Add a keyword to denote things that can't be copied. While this sounds like a good idea, it doesn't really solve the issue. You can tell the compiler that UniqueID must not copied, but at the same time you can't define how this should happen. And even if you could, you could just...
Create a copy constructor (C++) or a method to copy/clone the object (C#, e.g. CopyTo()).
Hmm.. My view is that:
A) because very rarely you want to have the copy really deep
B) because the framework cannot guarantee to know how to truly and meaningfully CLONE an object
C) because implementing deep-cloning in a naiive way is simple and takes one method and several lines of code using reflection and recursion
but I'll try to find an old MSDN article that covered that
edit: I've not found :( I'm still sure that I saw it somewhere, but I cannot google-it-out now.. However some useful links about related ICloneable and derived:
http://blogs.msdn.com/b/brada/archive/2004/05/03/125427.aspx
http://blogs.msdn.com/b/mrtechnocal/archive/2009/10/19/why-not-icloneable-t.aspx
https://stackoverflow.com/a/3712564/717732
So, as I've not found the author's words, let me expand the points:
A: because very rarely you want to have the copy really deep
You see, how can the framework guess how deep should it be in general? Let's assume that completely-deep and let's assume it has been implemented. Now we have memberwise-clone and total-clone methods. Still, there are some cases when people will need clone-me-but-not-the-root-base. So they post another questions why the total-clone has no way of cutting off the raw base. Or second-to-raw. Etc. Providing deep-clone solves almost nothing from the .Net team's point of view, as we, the users, will still rant about that just because we see some partial tools and are lazy and want to have everything:)
B) because the framework cannot guarantee to know how to truly and meaningfully CLONE an object
Especially with some special objects with handles or native-like IDs like those from Entity Framework, .Net Remoting Proxies, COM-wrappers etc: You might sucessfully read and clone the upper class hierarchy layers, but eventually, somewhere below you find some arcane thingies like IntPtrs that you just know that you should not copy. Most of the times. But sometimes you can. But the framework's code must be universal. Deep-cloning would either have to be harshly complicated with many sanity checks against specially-looking class members, or it would produce dangerous results if the programmer would invoke it on something that has base classes that the programmer did not care to analyze.
B+) Also, please note that the more base classes you have in your tree, the more probably is that they will have some parameterized constructors, which might indicate that direct-copying is not a good idea. Direct-copiable classes usually have parameterless constructors and all the copiable data accessible by properties..
B++) From the framework's designer point of view, taking memory and speed concerns, shallow copying is almost always very fast, while deep copying is just the opposite. It is beneficial to the framework's and platform's reputation to NOT allow the developers to freely deep-copy huge objects. Anyways, would you need a deep-copy if your object was lightweight and simple, huh? :) Not providing a deep-copy encourages the developers to think around the need of deep-copy, what usually makes the application lighter and faster.
C) because implementing deep-cloning in a naiive way is simple and takes one method and several lines of code using reflection and recursion
Having a shallow copy, how hard it is to actually write a deep copy? Not so hard! Just implement a method that is given an object 'obj':
pseudocode:
object deepcopier(object obj)
newobject = obj.shallowcopy()
foreach(field in newobject.fields)
newobject.field = deepcopier(newobject.field)
return newobject
and well, that's all. Of course the field enumeration must be performed by Reflection, and also reading/writing the fields - too.
However, this way is very naiive. It this has a serious flaw: what if some object has two fields that point to the same another object? We should detect it and do the cloning once then assign both fields to that one clone. Also if an object pointed by some field has reference to some object that is also pointed by another object (...) - that may also need to be tracked and cloned only once. Also, how about cycles? if somewhere there deep in the tree, an object has a reference back to the root? Such algo like above would happily descent and would re-copy everything once again, then again, and eventually would choke with StackOverflow.
This makes the cloning quite hard to be tracked and starts to look more like serialization. In fact if your class is a DataContract or Serializable, you can simply serialize it and deserialize to get a perfect deep copy :)
Deep-cloning is hard to do in an universal way, unless you know what the object means and what all its fields mean and know which ones should really be cloned and which should be unified. If you, as developer, know that this is just a data-object that is perfectly safe to deep-clone, so whydontya just make it Serializable? If you can't make it Serializable, then probably you also can't deep-clone it!

C# Code Contracts: What can be statically proven and what can't?

I might say I'm getting quite familiar with Code Contracts: I've read and understood most of the user manual and have been using them for quite a while now, but I still have questions. When I search SO for 'code contracts unproven' there are quite a few hits, all asking why their specific statement couldn't be statically proven. Although I could do the same and post my specific scenario (which is btw:
),
I'd much rather understand why any Code Contract condition can or can't be proven. Sometimes I'm impressed with what it can prove, and sometimes I'm... well... to say it politely: definately not impressed. If I want to understand this, I'd like to know the mechanisms the static checker uses. I'm sure I'll learn by experience, but I'm spraying Contract.Assume statements all over the place to make the warnings go away, and I feel like that's not what Code Contracts are meant for. Googling didn't help me, so I want to ask you guys for your experiences: what (unobvious) patterns have you seen? And what made you see the light?
The contract in your construction is not satisfied. Since you are referencing an object’s field (this.data), other threads may have access to the field and may change its value between the Assume and the first parameter resolution and the third parameter resolution. (e.i., they could be three completely different arrays.)
You should assign the array to a local variable, then use that variable throughout the method. Then the analyzer will know that the constraints are being satisfied, because no other threads will have the ability to change the reference.
var localData = this.data;
if (localData == null) return;
byte[] newData = new byte[localData.Length]; // Or whatever the datatype is.
Array.Copy(localData, newData, localData.Length); // Now, this cannot fail.
This has the added benifit of not only satisfying the constraint, but, in reality, making the code more robust in many cases.
I hope this leads you to the answer to your question. I could not actually answer your question directly, because I do not have access to a version of Visual Studio that includes the static checker. (I'm on VS2008 Pro.) My answer is based on what my own visual inspection of the code would conclude, and it appears that the static contract checker uses similar techniques. I am intreagued! I need to get me one of them. :-D
UPDATE: (Lots of speculation to follow)
Upon reflection, I think I can do a pretty good guess of what can or can't be proven (even without access to the static checker). As stated in the other answer, the static checker does not do interprocedural analysis. Therefore, with the looming possibility of multi-threaded variable accesses (as in the OP), the static checker can only deal effectively with local variables (as defined below).
By "local variables" I mean a variable that cannot be accessed by any other thread. This would include any variables declared in the method or passed as a parameter, unless the parameter is decorated with ref or out or the variable is captured in an anonymous method.
If a local variable is a value-type, then its fields are also local variables (and so on recursively).
If a local variable is a reference-type, then only the reference itself—not its fields—can be considered a local variable. This is true even of an object constructed within the method, since a constructor itself may leak a reference to the constructed object (say to a static collection for caching, for example).
So long as the static checker does not do any interprocedural analysis, any assumptions made about variables that are not local as defined above can be invalidated at any time, and, therefore, are ignored in the static analysis.
Exception 1: since strings and arrays are known by the runtime to be immutable, their properties (such as Length) are subject to analysis, so long as the string or array variable itself is local. This does not include the contents of an array which are mutable by other threads.
Exception 2: The array constructor may be known by the runtime not to leak any references to the constructed array. Therefore, an array that is constructed within the method body and not leaked outside of the method (passed as a parameter to another method, assigned to a non-local variable, etc.) has elements that may also be considered local variables.
These restrictions seem rather onerous, and I can imagine several ways this could be improved, but I don't know what has been done. Here are some other things that could, in theory, be done with the static checker. Someone who has it handy should check to see what has been done and what hasn't:
It could determine if a constructor does not leak any references to the object or its fields and consider the fields of any object so constructed to be local variables.
A no-leaks analysis could be done on other methods to determine whether a reference type passed to a method can still be considered local after that method invocation.
Variables decorated with ThreadStatic or ThreadLocal may be considered local variables.
Options could be given to ignore the possibility of using reflection to modify values. This would allow private readonly fields on reference types or static private readonly fields to be considered immutable. Also, when this option is enabled, a private or internal variable X that is only ever accessed inside a lock(X){ /**/ } construction and which is not leaked could be considered a local variable. However, these things would, in effect, reduce the reliability of the static checker, so that's kinda iffy.
Another possibility that could open up a lot of new analysis would be declaratively assigning variables and the methods that use them (and so on recursively) to a particular unique thread. This would be a major addition to the language, but it might be worth it.
The short answer is that the static code analyzer appears to be very limited. For instance, it does not detect
readonly string name = "I'm never null";
as being an invariant. From what I can gather on MSDN forums, it analyzes every method by itself (for performance reasons, not that one should think it could get much slower), which limits its knowledge when verifying the code.
To strike a balance between the academically lofty goal of proving correctness and being able to get work done, I've resorted to decorating individual methods (or even classes, as needed) with
[ContractVerification(false)]
rather than sprinkle the logic with lots of Assumes. This may not be best practice for using CC, but it does provide a way to get rid of warnings without unchecking any of the static checker options. In order not to lose pre/post-condition checks for such methods I generally add a stub with the desired conditions and then invoke the excluded method to perform the actual work.
My own assessment of Code Contracts is that it's great if you're only using the official framework libraries and do not have a lot of legacy code (e.g. when starting a new project). Anything else and it's a mixed bag of pleasure and pain.

Is it good practice to use reflection in your business logic?

I need to work on an application that consists of two major parts:
The business logic part with specific business classes (e.g. Book, Library, Author, ...)
A generic part that can show Books, Libraries, ... in data grids, map them to a database, ...).
The generic part uses reflection to get the data out of the business classes without the need to write specific data-grid or database logic in the business classes. This works fine and allows us to add new business classes (e.g. LibraryMember) without the need to adjust the data grid and database logic.
However, over the years, code was added to the business classes that also makes use of reflection to get things done in the business classes. E.g. if the Author of a Book is changed, observers are called to tell the Author itself that it should add this book to its collection of books written by him (Author.Books). In these observers, not only the instances are passed, but also information that is directly derived from the reflection (the FieldInfo is added to the observer call so that the caller knows that the field "Author" of the book is changed).
I can clearly see advantages in using reflection in these generic modules (like the data grid or database interface), but it seems to me that using reflection in the business classes is a bad idea. After all, shouldn't the application work without relying on reflection as much as possible? Or is the use of reflection the 'normal way of working' in the 21st century?
Is it good practice to use reflection in your business logic?
EDIT: Some clarification on the remark of Kirk:
Imagine that Author implements an observer on Book.
Book calls all its observers whenever some field of Book changes (like Title, Year, #Pages, Author, ...). The 'FieldInfo' of the changed field is passed in the observer.
The Author-observer then uses this FieldInfo to decide whether it is interested in this change. In this case, if FieldInfo is for the field Author of Book, the Author-Observer will update its own vector of Books.
The main danger with Reflection is that the flexibility can escalate into disorganized, unmaintainable code, particularly if more junior devs are used to make changes, who may not fully understand the Reflection code or are so enamored of it that they use it to solve every problem, even when simpler tools would suffice.
My observation has been that over-generalization leads to over-complication. It gets worse when the actual boundary cases turn out to not be accommodated by the generalized design, requiring hacks to fit in the new features on schedule, transmuting flexibility into complexity.
I avoid using reflection. Yes, it makes your program more flexible. But this flexibility comes at a high price: There is no compile-time checking of field names or types or whatever information you're collecting through reflection.
Like many things, it depends on what you're doing. If the nature of your logic is that you NEVER compare the field names (or whatever) found to a constant value, then using reflection is probably a good thing. But if you use reflection to find field names, and then loop through them searching for the fields named "Author" and "Title", you've just created a more-complex simulation of an object with two named fields. And what if you search for "Author" when the field is actually called "AuthorName", or you intend to search for "Author" and accidentally type "Auhtor"? Now you have errors that won't show up until runtime instead of being flagged at compile time.
With hard-coded field names, your IDE can tell you every place that a certain field is used. With reflection ... not so easy to tell. Maybe you can do a text search on the name, but if field names are passed around as variables, it can get very difficult.
I'm working on a system now where the original authors loved reflection and similar techniques. There are all sorts of places where they need to create an instance of a class and instead of just saying "new" and the class, they create a token that they look up in a table to get the class name. What does this gain? Yes, we could change the table to map that token to a different name. And this gains us ... what? When was the last time that you said, "Oh, every place that my program creates an instance of Customer, I want to change to create an instance of NewKindOfCustomer." If you have changes to a class, you change the class, not create a new class but keep the old one around for nostalgia.
To take a similar issue, I make a regular practice of building data entry screens on the fly by asking the database for a list of field names, types, and sizes, and then laying it out from there. This gives me the advantage of using the same program for all the simpler data entry screens -- just pass in the table name as a parameter -- and if a field is added or deleted, zero code change is required. But this only works as long as I don't care what the fields are. Once I start having validations or side effects specific to this screen, the system is more trouble than it's worth, and I'm better off to fall back to more explicit coding.
Based on your edit, it sounds like you are using reflection purely as a mechanism for identifying fields. This is as opposed to dynamic behavior such as looking up the fields, which should be avoided when possible (since such lookups usually use strings which ruin static type safety). Using FieldInfo to provide an identifier for a field is fairly harmless, though it does expose some internals (the info class) in a way that is not entirely ideal.
I tend not to use reflection where i can help it. by using interfaces and coding against these i can do a lot of things that some would use reflection for.
But im a big fan of if it works, it works.
Also by using reflection you probably have something that can adapt fairly easily.
Ie the only objection most would have is fairly religious ... and if your performance is fine and the code is maintainable and clear .... who cares?
Edit: based on your edit i would indeed use interfaces to achieve what you want. Unless i misunderstand you.
I think it is a good idea to stay away from Reflection when possible, but dont be afraid to resort to it when it provides a better or more flexible solution to your problem. The performance hit for anything but tight loop operations is likely to be minimal in the overall scheme of an application or Web Form request.
Just a good article to share about reflection -
http://www.simple-talk.com/dotnet/.net-framework/a-defense-of-reflection-in-.net/
I tend to use interfaces in my business layer and leave the reflection to my presentation layer. This is not an absolute but rather a guideline.

Categories