deep cloning related question - c#

I have implemented an evolutionary algorithm in C# which kind of works but I have the assumption that the cloning does not. The algorithm maintains a population of object trees. Each object (tree) can be the result of cloning (e.g. after ‘natural selection’) and should be a unique object consisting of unique objects. Is there a simple way to determine whether the ‘object population’ contains unique/distinct objects – in other words whether objects are shared by more than one tree? I hope my question makes sense.
Thanks.
Best wishes,
Christian
PS: I implemented (I think) deep copy cloning via serialization see:
http://www.codeproject.com/KB/tips/SerializedObjectCloner.aspx

The way to verify whether two objects are the same objects in memory is by comparing them using Object.ReferenceEquals. This checks whether the "Pointers" are the same.

Cloning in C# is by default a shallow copy. The keyword you probably need to google for tutorials is "deep cloning", in order to create object graphs that don't share references.

What about following: Add a static counter to every class for member references of your 'main tree class'. Increment counter it in every counstructor. Determine how many of a 'subobjects'should be contained in tree object (or all tree-objects) and compare that to the counter.

OK, for first, let me see if I get you correctly:
RESULT is a object tree that has some data.
GENERATION is a collection of a result objects.
You have some 'evolution' method that moves each GENERATION to the next step.
If you want to check if one RESULT is equal to the other, you should implement IComparable on it, and for each of its members, do the same.
ADDITION:
Try to get rid of that kind of cloning, and make the clones manually - it WILL be faster. And speed is crucial to you here, because all heuristic comes down to muscle.

If what you're asking is "how do i check if two object variables refer to the same bit of memory, you can call Object.ReferenceEquals

Related

C# Object References vs. Objects in a List

I asked a question similar to this one a few days ago: C# Does an array of objects store the pointers of said objects?
Now, what I'm asking is a bit more code related. A brief overview, I'm trying to store many objects into a list and the said objects may be very large (1 ~ 2 GB). From my research, a .NET data structure only has 2 GB max of memory. So my question is, am I able to fit more than 1 or 2 objects into a list?
Scenario: I have a human class and I'm trying to store it in a list.
Are these two lines of code different?
List<Human> humanList = new List<Human>();
Human human = new Human(attr1, attr2, attr3);
humanList.Add(human)
vs.
List<Human> humanList = new List<Human>();
humanList.Add(new Human(attr1, attr2, attr3))
Is the first set of code using a reference to the human object so I'll be able to store more objects in the list? Similarly, is the second set of code trying to store the whole human object into the list?
Your two code samples are equivalent (assuming you aren't doing anything else with the human variable after you create it). The new keyword creates a reference which is stored in the list - you're just creating that reference inline instead of storing it in a variable.
Assuming Human is a class and not a struct, the list will contain references to the objects, not the objects themselves.
The size of a Human object will have less bearing on how many references you can store in the list (overall system memory will likely be more of an issue that the 2GB .NET limit).
Both code examples have the same effect.
Both examples store references to your objects.
You will be able to store many more than 1 or 2 objects in the list.
This gets managed by the .Net CLR and code optimization. Your code would be changed in either scenario to store a reference to the human object. The only difference here is that the first example has a hook to the reference for the local method.

.Net Collection for atomizing T?

I am looking if there is a pre-existing .Net 'Hash-Set type' implementation suitable to atomizing a general type T. We have a large number of identical objects coming in for serialized sources that need to be atomized to conserve memory.
A Dictionary<T,T> with the value == key works perfectly, however the objects in these collections can run into the millions across the app, and so it seem very wasteful to store 2 references to every object.
HashSet cannot be used as it only has Contains, there ?is no way? to get to the actual member instance.
Obviously I could roll my own but wanted to check if there was anything pre-existing. A scan at C5 didn't see anything jumping out, but then their 250+ page documentation does make me wonder if I've missed something.
EDIT The fundemental idea is I need to be able to GET THE UNIQUE OBJECT BACK ie HashSet has Contains(T obj) but not Get(T obj) /EDIT
The collection at worst only needs to implement:
T GetOrAdd(T candidate)
void Clear()
And take an arbitary IComparer
And GetOrAdd is ~O(1) and would ideally be atomic, i.e. doesn't waste time Hashing twice.
EDIT Failing an existing implementation any recommendations on sources for the basic Hashing / Bucketing mechanics would be appreciated. - The Mono HashSet source has been pointed out for this and thus this section is answered /EDIT
You can take a source code of a HashSet<T> from Reference Source and write your own GetOrAdd method.

Is testing generic collections for referential equality in C# a silly idea?

I'm implementing a special case of an immutable dictionary, which for convenience implements IEnumerable<KeyValuePair<Foo, Bar>>. Operations that would ordinarily modify the dictionary should instead return a new instance.
So far so good. But when I try to write a fluent-style unit test for the class, I find that neither of the two fluent assertion libraries I've tried (Should and Fluent Assertions) supports the NotBeSameAs() operation on objects that implement IEnumerable -- not unless you first cast them to Object.
When I first ran into this, with Should, I assumed that it was just a hole in the framework, but when I saw that Fluent Assertions had the same hole, it made my think that (since I'm a relative newcomer to C#) I might be missing something conceptual about C# collections -- the author of Should implied as much when I filed an issue.
Obviously there are other ways to test this -- cast to Object and use NotBeSameAs(), just use Object.ReferenceEquals, whatever -- but if there's a good reason not to, I'd like to know what that is.
An IEnumerable<T> is not neccessarily a real object. IEnumerable<T> guarantees that you can enumerate through it's states. In simple cases you have a container class like a List<T> that is already materialized. Then you could compare both Lists' addresses. However, your IEnumerable<T> might also point to a sequence of commands, that will be executed once you enumerate. Basically a state machine:
public IEnumerable<int> GetInts()
{
yield return 10;
yield return 20;
yield return 30;
}
If you save this in a variable, you don't have a comparable object (everything is an object, so you do... but it's not meaningful):
var x = GetInts();
Your comparison only works for materialized ( .ToList() or .ToArray() ) IEnumerables, because those state machines have been evaluated and their results been saved to a collection. So yes, the library actually makes sense, if you know you have materialized IEnumerables, you will need to make this knowledge public by casting them to Object and calling the desired function on this object "manually".
In addition what Jon Skeet suggested take a look at this February 2013 MSDN article from Ted Neward:
.NET Collections, Part 2: Working with C5
Immutable (Guarded) Collections
With the rise of functional concepts
and programming styles, a lot of emphasis has swung to immutable data
and immutable objects, largely because immutable objects offer a lot
of benefits vis-à-vis concurrency and parallel programming, but also
because many developers find immutable objects easier to understand
and reason about. Corollary to that concept, then, follows the concept
of immutable collections—the idea that regardless of whether the
objects inside the collection are immutable, the collection itself is
fixed and unable to change (add or remove) the elements in the
collection. (Note: You can see a preview of immutable collections
released on NuGet in the MSDN Base Class Library (BCL) blog at
bit.ly/12AXD78.)
It describes the use of an open source library of collection goodness called C5.
Look at http://itu.dk/research/c5/

Library for traversing object property tree on C#

I want to have a method that could traverse an object by property names and get me the value of the property.
More specifically as an input I have a string like "Model.Child.Name" and I want this method to take an object and get me the value that could be found programatically via: object.Model.Child.Name.
I understand that the only way to do this is to use Reflection, but I don't want to write this code on my own, because I believe that there are pitfalls. Moreover, I think it is more or less usual task.
Is there any well-known implementation of algorithm like that on C#?
Reflection is the way to go.
Reflection to access properties at runtime
You can take a look at ObjectDumper and modify the source code as per your requirement.
ObjectDumper take a .NET object and dump it to string, file, textWriter etc.
The is not that difficult to write. Yes there are some pitfalls, but it's good to know the pitfalls.
The algorithm is straightforward, it's traversing a tree structure. At each node you inspect it for a primitive value (int, string, char, etc) if it's not one of these times, then its a structure that has one or more primitives and needs to be traversed to it's primitives.
The pitfalls are dealing with nulls, nullable types, value versus reference types, etc. Straight forward stuff that every developer should know about.

Arraylist can't compare objects after they are loaded from disk

To make it easy, lets say I have an arraylist allBooks containing class "books" and an arraylist someBooks containing some but not all of the "books".
Using contains() method worked fine when I wanted to see if a book from one arraylist was also contained in another.
The problem was that this isn't working anymore when I save both of the Arraylists to a .bin file and load them back once the program restarts. Doing the same test as before, the contains() returns false even if the compared objects are the same (have the same info inside).
I solved it by overloading the equals method and it works fine, but I want to know why did this happen?
You will have to provide your own hash code and equals implementation. By default, it will simply use pointer equality, which obviously fails after objects been 'cloned' (serialized/ deserialized cycle).
What happened was that when you originally created the lists they both contained references to the same objects, but when you loaded them back in they both got separate copies of the same objects. Since each list got separate copies, they didn't contain the same references, meaning they didn't compare as equal without overloading the right method.
This sounds like the common issue of referential equality vs Equals, and is especially common with serialization. Override Equals (and GetHashCode) appropriately and you should be back in business.
For info, using ArrayList should generally be avoided unless you are using .NET 1.1 (or micro-framework), or have a very valid reason to do so; prefer the generic typed collections, such as List<T>.
Assuming book is an object, by default Equals checks if the reference is equal. That cannot be the case when you load new objects. Overwriting the Equals method is a right approach.
Other options are to change Book to a struct, or using a more modern container, like a dictionary or hashtable, where you can store books by id.

Categories