I'm having trouble creating copies of my class instances from a dictionary of templates. It appears that MemberwiseClone() leaves some fields referenced to the dictionary's template fields. I'd like to be able to see if that's so in a convenient way, like Visual Studio's DataTips provide.
Is there a way to find out if an instance of a reference type object (or its fields) is referencing another instance of the same type (after memberwise cloning)?
The rule is that any value type will be copied and any reference type will only copy the reference. It is a shallow copy.
If that is not the behaviour that you want then you need to roll your own clone method.
You are probably talking about a deep copy, in which case this will tell you what you need to know: How do you do a deep copy of an object in .NET (C# specifically)?
As for counting the number of references to an instance, Eric Lippert says that C# does not do reference counting C# - Get number of references to object so again you would have to roll your own. But I don't think that is what you want to do.
You could use a memory profiler for manually checking the references. See .NET Memory Profiling Tools.
One "feature" of Java is that there's really only one non-primitive type: an object reference, which may be used in all sorts of ways. While that makes the framework easy to implement, it means that the type of a variable is insufficient to describe its meaning. While .net improves on Java in many ways, it shares that fundamental weakness.
For example, suppose an object George has a field Bob of type IList<String> [or, for Java, list<string>]. There are at least five fundamentally different things such a field could represent:
It holds a reference to an object holding a set of strings to which it will never allow any changes. If item #5 of that list is "Barney", then item #5 on that list will never be anything other than "Barney". In this scenario, `Bob` encapsulates only immutable aspects of the list. Further, the reference may be freely shared to any code that's interested in that aspect of George's state.
It holds a reference to an object holding a set of strings which may be modified by anyone holding a reference, but no entity with a reference to that object will modify the list nor allow it to be exposed to anything that might do so. In other worse, while the list would allow its contents to be changed, nothing will in fact ever alter those contents. As above, `Bob` encapsulates only immutable aspects of the list, but `George` is responsible for maintaining such immutability by exposing the reference only to code that can be trusted not to modify the list.
It holds the only reference, anywhere in the universe, to an object holding a set of strings which it modifies at will. In this scenario, `Bob` encapsulates the *mutable state* of the list. If one copies `George`, one must make a new list with the same items as the old, and give the copy a reference to that. Note also that `George` cannot pass a reference to any code that might persist the reference, whether or not that code would try to modify the list.
It holds a reference to a list which is "owned" by some other object, which will be used either to add items to the list for the other object's benefit, or to observe things the other object has put in the list for `George`'s benefit. In this scenario, `Bob` encapsulates the *identity* of the list. In a correct clone, `Bob` must identify the *same list* as in the original.
It holds a reference to a list which it owns, and which will be mutated, but to which some other objects also hold a reference (perhaps so they can add things to the list for `George`'s benefit, or perhaps so they can see things `George` does with the list). In this scenario, `Bob` encapsulates *both mutable state and identity*. The existence of a field which encapsulates both aspects means that *it is not possible to make a semantically-correct copy of `George` without the cooperation of other objects*.
In short, it's possible for Bob to encapsulate the list's mutable state, its identity, both, or neither (immutable state, other than identity, is a 'freebie'). If it encapsulates only mutable state, a semantically-correct copy of George must have Bob reference a different list which is initialized with the same contents. If it encapsulates only identity, a semantically-correct copy must have Bob reference the same list. If it encapsulates both mutable state and immutble state, George cannot be properly cloned in isolation. Fields that do neither may be copied or not, as convenient.
If one can properly determine which fields encapsulate the referenced objects' mutable states, which ones encapsulate identity, and which ones both, it will be obvious what a semantically-correct cloning operation should do. Unfortunately, there's no standard convention in the Framework for categorizing fields in such fashion, so you'll have to come up with your own method and then a cloning scheme that uses it.
Related
I've read and heard a lot of good things about immutability, so I decided to try it out in one of my hobby projects. I declared all of my fields as readonly, and made all methods that would usually mutate an object to return a new, modified version.
It worked great until I ran into a situation where a method should, by external protocol, return a certain information about an object without modifying it, but at the same time could be optimized by modifying the internal structure. In particular, this happens with tree path compression in a union find algorithm.
When the user calls int find(int n), object appears unmodified to the outsider. It represents the same entity conceptually, but it's internal fields are mutated to optimize the running time.
How can I implement this in an immutable way?
Short answer: you have to ensure the thread-safety by yourself.
The readonly keyword on a field gives you the insurance that the field cannot be modified after the object containing this field has been constructed.
So the only write you can have for this field is contained in the constructor (or in the field initialization), and a read through a method call cannot occur before the object is constructed, hence the thread-safety of readonly.
If you want to implement caching, you break the assumption that only one write occurs (since "caching writes" can and will occur during you reads), and thus there can be threading problems in bad cases (think you're reading lines from a file, two threads can call the find method with the same parameter but read two different lines and therefore get different results).
What you want to implement is observational immutability. This related question about memoization may help you with an elegant answer.
I have a List of objects, which can be accessed by multiple users from a WebService. However, the number of objects in the list is steadily growing, so I need some memory management.
I would like to clear all elements from the list, which are not used by any user. However, I cannot do this simply by calling the GC, because there is still one reference (the one from the List). And I don't know, how to get the number of references to an object.
So, is there a way, how to clear all objects, that have just one reference? Or get the number of references? Or determine, whether there is no other reference outside the List? Any solution is welcome.
You can use a so called Weak List.
Basically a weak list is a list whose references are "ignored" by the GC. So while there is still a reference from the list, it will not be counted and (depending on which implementation of weak list you use) the item will be removed automatically at one point from the list.
Unfortunately there is no direct implementation of a weak list in the .NET Framework. There is the ConditionalWeakTable though which you might be able to use like a list and there are several examples for weak lists on the web which use the WeakReference type or similar mechanisms.
Examples:
Is there a way to do a WeakList or WeakCollection (like WeakReference) in CLR?
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was using a custom method to deep clone objects the other day, and I know you can deep clone in different ways (reflection, binary serialization, etc), so I was just wondering:
What is/are the reason(s) that Microsoft does not include a deep copy method in the framework?
The problem is substantially harder than you seem to realize, at least in the general case.
For starters, a copy isn't just deep or shallow, it's a spectrum.
Let's imagine for a second that we have a list of arrays of strings, and we want to make a copy of it.
We start out at the shallowest level, we just copy the reference of the whole thing to another variable. Any changes to the list referenced from either variable is seen by the other.
So now we go and create a brand new list to give to the second variable. For each item in the first list we add it to the second list. Now we can modify the list referenced from either variable without it being seen by the other one. But what if we grab the first item of a list and change the string that's in the first array, it will be seen by both lists!
Now we're going through and creating a new list, and for each array in the first list we're creating a new array, adding each of the strings in the underlying array to the new array, and adding each of those new arrays to the new list. Now we can mutate any of the arrays in either list without seeing the changes. But wait, both lists are still referencing the same strings (which are value types after all; they internally have a character array for their data). What if some mean person were to come along and mutate one of the strings (using unsafe code you could actually do this)! So now you're copying all of the strings with a deep copy. But what if we don't need to do that? What if we know that nobody is so mean that they would mutate the string? Or, for that matter, what if we know that none of the arrays will be mutated (or that if they will be, that they're supposed to be reflected by both lists).
Then of course there are problems such as circular references, fields in a class that don't really represent it's state (i.e. possibly cached values or derived data that could just be re-calculated as-needed by a clone).
Realistically you'd need to have every type implement IClonable or some equivalent, and have it's own custom code for how to clone itself. This would be a lot of work to maintain for a language, especially since there are so many ways that complex objects could possibly be cloned. The cost would be quite high, and the benefits (outside of a handful of objects that it is deemed worthwhile to implement clone methods for) are generally not worth it. You, as a programmer, and write your own logic for cloning a type based on how deep you know you need to go.
It's similar to how it works (or doesn't work) in C and C++:
To do a deep copy, you actually have to know how different data is interpreted. In trivial cases, a shallow copy (which is provided) is the same as a deep copy. But once this is no longer true, it really depends on the implementation and interpretation. There's no general rule of thumb.
Let's use a game as a simple example:
A NPC object has two integers as members. One integer represents its health points, the other one is its unique ID.
If you clone the NPC, you have to keep the amount of health, while changing the unique ID. This is something the compiler/runtime can't determine on their own. You have to code this, essentially telling the program "how to copy".
I can think of two possible solutions:
Add a keyword to denote things that can't be copied. While this sounds like a good idea, it doesn't really solve the issue. You can tell the compiler that UniqueID must not copied, but at the same time you can't define how this should happen. And even if you could, you could just...
Create a copy constructor (C++) or a method to copy/clone the object (C#, e.g. CopyTo()).
Hmm.. My view is that:
A) because very rarely you want to have the copy really deep
B) because the framework cannot guarantee to know how to truly and meaningfully CLONE an object
C) because implementing deep-cloning in a naiive way is simple and takes one method and several lines of code using reflection and recursion
but I'll try to find an old MSDN article that covered that
edit: I've not found :( I'm still sure that I saw it somewhere, but I cannot google-it-out now.. However some useful links about related ICloneable and derived:
http://blogs.msdn.com/b/brada/archive/2004/05/03/125427.aspx
http://blogs.msdn.com/b/mrtechnocal/archive/2009/10/19/why-not-icloneable-t.aspx
https://stackoverflow.com/a/3712564/717732
So, as I've not found the author's words, let me expand the points:
A: because very rarely you want to have the copy really deep
You see, how can the framework guess how deep should it be in general? Let's assume that completely-deep and let's assume it has been implemented. Now we have memberwise-clone and total-clone methods. Still, there are some cases when people will need clone-me-but-not-the-root-base. So they post another questions why the total-clone has no way of cutting off the raw base. Or second-to-raw. Etc. Providing deep-clone solves almost nothing from the .Net team's point of view, as we, the users, will still rant about that just because we see some partial tools and are lazy and want to have everything:)
B) because the framework cannot guarantee to know how to truly and meaningfully CLONE an object
Especially with some special objects with handles or native-like IDs like those from Entity Framework, .Net Remoting Proxies, COM-wrappers etc: You might sucessfully read and clone the upper class hierarchy layers, but eventually, somewhere below you find some arcane thingies like IntPtrs that you just know that you should not copy. Most of the times. But sometimes you can. But the framework's code must be universal. Deep-cloning would either have to be harshly complicated with many sanity checks against specially-looking class members, or it would produce dangerous results if the programmer would invoke it on something that has base classes that the programmer did not care to analyze.
B+) Also, please note that the more base classes you have in your tree, the more probably is that they will have some parameterized constructors, which might indicate that direct-copying is not a good idea. Direct-copiable classes usually have parameterless constructors and all the copiable data accessible by properties..
B++) From the framework's designer point of view, taking memory and speed concerns, shallow copying is almost always very fast, while deep copying is just the opposite. It is beneficial to the framework's and platform's reputation to NOT allow the developers to freely deep-copy huge objects. Anyways, would you need a deep-copy if your object was lightweight and simple, huh? :) Not providing a deep-copy encourages the developers to think around the need of deep-copy, what usually makes the application lighter and faster.
C) because implementing deep-cloning in a naiive way is simple and takes one method and several lines of code using reflection and recursion
Having a shallow copy, how hard it is to actually write a deep copy? Not so hard! Just implement a method that is given an object 'obj':
pseudocode:
object deepcopier(object obj)
newobject = obj.shallowcopy()
foreach(field in newobject.fields)
newobject.field = deepcopier(newobject.field)
return newobject
and well, that's all. Of course the field enumeration must be performed by Reflection, and also reading/writing the fields - too.
However, this way is very naiive. It this has a serious flaw: what if some object has two fields that point to the same another object? We should detect it and do the cloning once then assign both fields to that one clone. Also if an object pointed by some field has reference to some object that is also pointed by another object (...) - that may also need to be tracked and cloned only once. Also, how about cycles? if somewhere there deep in the tree, an object has a reference back to the root? Such algo like above would happily descent and would re-copy everything once again, then again, and eventually would choke with StackOverflow.
This makes the cloning quite hard to be tracked and starts to look more like serialization. In fact if your class is a DataContract or Serializable, you can simply serialize it and deserialize to get a perfect deep copy :)
Deep-cloning is hard to do in an universal way, unless you know what the object means and what all its fields mean and know which ones should really be cloned and which should be unified. If you, as developer, know that this is just a data-object that is perfectly safe to deep-clone, so whydontya just make it Serializable? If you can't make it Serializable, then probably you also can't deep-clone it!
Suppose I have a big object that contains the properties I require and additionally several methods and a few sizeable collections.
I would like to know what would cost more: To pass as an argument this big object that already exists, or create a small object containing only the handful of properties I require and pass that?
If you're just passing it as an argument to a method, passing the "big" object is "cheaper" - you'll only be passing a copy of the object reference, which is just the size of a pointer (unless the object is of a struct type, in which case the whole "object" is copied into the stack). If you were to create a new object, even if it's small, you'd be paying the price of the allocation and copying of the properties onto it, which you don't have if you pass the "large" object.
If you're passing it in a way that it needs to be serialized (i.e., across applications, in a call to a web service, etc.), then having the smaller object (essentially a DTO, Data Transfer Object) is preferred.
As long as you pass by reference it doesn't matter. Therefore you shouldn't introduce additional complexity. In fact, it would cost the machine more, since it then would have to create that container object as well.
Since you seem concerned with performance I'd recommend to learn how pointers and memory management works in C. Once you understand that, you will have a much easier time understanding how their abstracted versions in higher level languages impact performance.
Is there a way to iterate over instances of a class in C#? These instances are not tracked or managed in a collection.
Not inside the regular framework. You would need to track them manually.
You can, however, do this in windbg/sos - mainly for debugging purposes (not for routine code).
You have to have references to them somewhere, or at least know where to look, so in identifying them you'd probably put them into a collection which you'd then iterate.
If you don't know where the references live, then you'd have to have to introduce some kind of tracking mechanism. Perhaps a static collection on the type? It would have to be implemented carefully though.
Not directly.
You could conceptually have your object place a copy of itself into some well-known place (e.g. a static collection) and then use that to iterate, but then you'd have to make sure you cleared the instance out of that collection at some point or else it'll never get garbage collected.
In the comment thread on this post there is an interesting discussion and solution related to this question.
As Marc said, if you want to do it in code, you would need to keep a collection of them. If you are debugging, have a look at this blog post: http://blogs.msdn.com/tess/archive/2006/01/23/516139.aspx
If you need a collection in memory of all of the object instances of a certain type, you could consider using a collection of System.WeakRef's A weak ref is a reference that does not keep the object that it references. This would let you keep a collection of weak-refs to the object instances you want to enumerate. Have a look at Weakrefs in the help for more info.