C# Deep Copying Tree Structures with BinaryFormatter

C# Deep Copying Tree Structures with BinaryFormatter - c#

EDIT 1
Kent has alleviated my fears. However, I've found one exception to the rule. I wrote a method for the Node class that would traverse up through the hierarchy until it got to the root Node and return its hash code. The hash code's the same all the way down the line except for one single object. In a word, RAGE.
I'm at the end of writing my first (relatively) large C# application. However, I think I've found a major bug that I screwed up on.
My app parses an XML file and creates a hierarchy of objects, each inheriting from a Node class, with Lists of Children and a Node Parent reference.
I needed to be able to copy this structure over. The concept is the initial structure holds the default data, and you can get your own copy and modify it while you use it. So I used a generic DeepClone< T > extension method to do it with BinaryFormatter.
My question, although I feel I already know (and dread) what the answer, is does this sill leave me with the problem of reassigning the references of all those Parent and Child nodes?
Disclaimer: As I finish this, I'm realizing all the design mistakes I made and how they could have been avoided, including this one. In my defense, this semester at university will be the first time I take a data structures class. ;) I fully expect there to be some vital part of a tree that I failed to implement that would help solve this. >_<

No. The serialization process records the references between objects and the deserialization process restores those relationships.
EDIT: unless I misunderstood your question - it's not exactly clear what you meant. Once the client code has made changes to the cloned structure, it is up to you to incorporate those changes into the main data structure, if that's what you want to do.
You might want to take a look at IEditableObject for a more formalized way to handling this kind of thing. You may well continue to use serialization as your means of cloning the object prior to edits, but your interface will be more standardized.

Related

How to avoid System.OutOfMemoryException in c# when building a non recombining trinomial tree

I have "successfully" implemented a non recombining trinomial tree to price certain fixed-income derivatives. (Something like shown in the picture below - but with three branches that don't reconnect)
Unfortunately it turned out that the number of nodes I can use was severely limited by the available memory. If I build a tree with 20 time-steps this results in 3^19 nodes (so 1,1 Billion nodes)
The nodes of each time step are saved in List<Node> and these arrays are stored in a Dictionary<double,List<Node>>
Each node is instantiated via new Node(...). I also instantiate each of the lists and the dictionary via new Class() Perhaps this is the source of my error.
Also System.OutOfMemoryException isn't thrown because of the Dictionary/List-Object being to large (as is often the case) but because I seem to have too many Nodes - after a while new Node(...) can't allocate any further memory. Eventually the 2GB max List-Capacity will also kick in I think - seeing as how List grows exponentially larger with each time step.
Perhaps my data-structure is too wasteful or not really suited for the task at hand.
A possible solution could be to save the tree to a text-file thus avoiding the memory-problem completely. This however would necessitate a HUGE workaround.
Edit:
To add some more background. I need the tree to price path dependant products. This means that unfortunately I will have to access all the nodes. What is more after the tree has been build I start from the leaves and go backwards in time to determine the price. I also already only generate the nodes I need.
Edit2:
I have given the topic some though and also considered the various responses. Could it be that I just need to serialize the respective tree levels to the hard-drive. So basically - I create one time-step (List<Node>) write it to Disk etc. Later on when I start from the leaves - I will just have to load it in reverse oder.

You basically have two choices. evaluate only the branches you care about (Andrew's yield) and don't store results or build up your tree and save it to disk and implement a custom collection interface on top of it that accesses the right part of the disk. In this case you are still going to keep a minimal amount of data in your process memory and rely on the OS to do proper disk caching to make access fast. If you start working with large data sets the second option is a good tool to have in your tool belt, so you should probably write this with reuse in mind.

What we have here is a classic problem of doing an enormous amount of processing up front... and then storing EVERYTHING into memory to be processed at a later time.
While simple, given harsh enough conditions (like having a billion entries), it will eat up all the memory.
Now, the OP didn't really specify what the intention of the tree was or how it was going to be used... but I would propose that instead of building it all at once... build it as you need it.
Lazy Evaluation with yield
Instead of doing everything all at once and having to store it... it might be ideal to do it ONLY when you actually require it. Check out this post for more info and examples of using yield.
This won't work great though if you need to traverese the tree a bunch of times... but it might still allow you to have deeper depth than you currently do.

I don't think Serializing to disk will help much. One, when you attempt to deserialize the list you will still run out of memory (as, to the best of my knowledge, there is no way to partially deserialize an object).
Have you considered changing your data structure into a relational database model and storing it in a SQLEXPRESS database?
This would give you the added benefit of performing queries with indexes instead of your custom tree traversal logic.

Data structure for hierarchical members in C#

I'm trying to read data from WSDL file and get stuck, because there could be a big hierarchical tree and I don't know what kind data structure use to get inputs and outputs, because they can have input as a object and object can point to couple simple inputs and second object... this could go on and on. So I don't know what to use. Maybe tree, maybe indexes. What is the best practise and can you give small example how data could be controlled?
P.S. I'm developing automated tests generation tool, whose will use WSDL files for generation.

Your best bet is to use good old classes. First thing to do is to use utility like svcutils.exe (Code generator tool) to create the client code from WSDL. Form this you will get the idea about how deep the tree is going to be.
Once you have Object View of the structure then start creating Classes and apply OOP design patterns. This will help with at least two things:
Avoiding code duplication and
When you start constructing your object in the code it will give you idea which node comes under which parent etc.
Hope this helps.
Another thing also to consider is use some sort of object serialization meach. Serialization will help you in great deal when dealing with complex tree like data from XML to objects and vice a versa.

WSDL is based on XML, which already is a tree structure. Not sure why you want to read it into objects first -- just use Linq to XML to read the WSDL directly.

Can a DataContractSerializer be setup to ignore errors in a file rather then just fail entirely?

I'm using DataContractSerializer to save a large number of different classes which make up a tree structures to XML files. I'm in the initial stages of writing this software so at this point all the different components are changing around quite a bit. Yet every time I make a change to a class I end up breaking my programs ability to open previously saved files.
My tree structures will still be functional if components are missing. Is there some way to tell DataContractSerializer to skip over data it has a problem deserializing and continue on rather then just quitting at the first problem it has?
I know one answer would be to write my own serialization class, but I'd rather not spend the time to do that. I was hopping to still be able to take advantage of DataContractSerializer, but with out it being an all or nothing situation.

I think what you're looking for is IExtensibleDataObject. This way, any unexpected elements get read into a name-value dictionary maintained internally, and can even be serialized back later. See the following resources for help.
Blog post -- WCF Extensibility – Other Serialization Extensions
Forward-Compatible Data Contracts
Data Contract Versioning

Undo Redo in WPF/C# in an already functional application

I have done some research already as to how I can achieve the title of this question. The app I am working on has been under development for a couple of years or so (slow progress though, you all know how it is in the real world). It is now a requirement for me to put in Undo/Redo multiple level functionality. It's a bit late to say "you should have thought about this before you started" ... well, we did think about it - and we did nothing about it and now here it is. From searching around SO (and external links) I can see that the two most common methods appear to be ...
Command Pattern
Memento Pattern
The command pattern looks like it would be a hell of a lot of work, I can only imagine it throwing up thousands of bugs in the process too so I don't really fancy that one.
The Memento pattern is actually a lot like what I had in my head for this. I was thinking if there was some way to quickly take a snapshot of the object model currently in memory, then I would be able to store it somewhere (maybe also in memory, maybe in a file). It seems like a great idea, the only problem I can see for this, is how it will integrate with what we have already written. You see the app as we have it draws images in a big panel (potentially hundreds) and then allows the user to manipulate them either via the UI or via a custom built properties grid. The entire app is linked up with a big observer pattern. The second anything changes, events are fired and everything that needs to update does. This is nice but I cant help thinking that if a user is entering text into a texfield on the properties grid there will be a bit of delay before the UI catches up (seems as everytime the user presses a key, a new snapshot will be added to the undo list). So my question to you is ....
Do you know of any good alternatives to the Memento pattern that might work.
Do you think the Memento pattern will fit in here or will it slow the app down too much.
If the Memento pattern is the way to go, what is the most efficient way to make a snapshot of the object model (i was thinking serialising it or something)
Should the snapshots be stored in memory or is it possible to put them into files?
If you have got this far, thankyou kindly for reading. Any input you have will be valuable and very much appreciated.

Well , Here is my thought on this problem.
1- You need multi level undo/redo functionality. so you need to store user actions performed which can be stored in a stack.
2- Your second problem how to identify what has been changed by a operation i think through Memento pattern , it is quite a challenge. Memento is all about toring initial object state in your memory.
either , you need to store what is changed by a operation so that you can use this information to undo the opertions.
Command pattern is designed for the Undo/Redo functionality and i would say that its late but its worth while to implement the design which is being used for several years and works for most of the applications.

If performance allows it you could serialize your domain before each action. A few hundred objects is not much if the objects aren't big themselves.
Since your object graph is probably non trivial (i.e. uses inheritance, cycles,...) the integrated XmlSerializer and JsonSerializers are out of question. Json.net supports these, but does some lossy conversions on some types (local DateTimes, numbers,...) so it's bad too.
I think the protobuf serializers need either some form of DTD(.proto file) or decoration of all properties with attributes mapping their name to a number, so it might not be optimal.
BinaryFormatter can serialize most stuff, you just need to decorate all classes with the [Serializable] attribute. But I haven't used it myself, so there might be pitfalls I'm not aware of. Perhaps related to Singletons or events.

The critical things for undo/redo are
knowing what state you need to save and restore
knowing when you need to save the state
Adding undo/redo after the fact is always a painful thing to do - (I know this comment is of no use to you now, but it's always best to design support into the application framework before you start, as it helps people use undo-friendly patterns throughout development).
Possibly the simplest approach will be a memento-based one:
Locate all the data that makes up your "document". Can you unify this data in some way so that it forms a coherent whole? Usually if you can serialise your document structure to a file, the logic you need is in the serialisation system, so that gives you a way in. The down side to using this directly is usually that you will usually have to serialise everything so your undo will be huge and slow. If possible, refactor code so that (a) there is a common serialisation interface used throughout the application (so any and every part of your data can be saved/restored using a generic call), and (b) every sub-system is encapsulated so that modifications to the data have to go through a common interface (rather than lots of people modifying member variables directly, they should all call an API provided by the object to request that it makes changes to itself) and (c) every sub-portion of the data keeps a "version number". Every time an alteration is made (through the interface in (b)) it should increment that version number. This approach means you can now scan your entire document and use the version numbers to find just the parts of it that have changed since you last looked, and then serialise the minimal amount to save and restore the changed state.
Provide a mechanism whereby a single undo step can be recorded. This means allowing multple systems to make changes to the data structure, and then when everything has been updated, triggering an undo recording. Working out when to do this may be tricky, but it can usually be accomplished by scanning your document for changes (see above) in your message loop, when your UI has finished processing each input event.
Beyond that, I'd advise going for a command based approach, because there are many benefits to it besides undo/redo.

You may find the Monitored Undo Framework to be useful. http://muf.codeplex.com/
It uses something similar to the memento pattern, by monitoring for changes as they happen and allows you to put delegates on the undo stack that will reverse / redo the change.
I considered an approach that would serialize / deserialize the document but was concerned about the overhead. Instead, I monitor for changes in the model (or view model) on a property by property bases. Then, as needed, I use the MUF library to "batch" related changes so that they undo / redo as a unit of change.
The fact that you have your UI setup to react to changes in the underlying model is good. It sounds like you could inject the undo / redo logic there and the changes would bubble up to the UI.
I don't think that you'd see much lag or performance degradation. I have a similar application, with a diagram that we render based on the data in the model. We've had good results with this so far.
You can find more info and documentation on the codeplex site at http://muf.codeplex.com/. The library is also available via NuGet, with support for .NET 3.5, 4.0, SL4 and WP7.

XML Serialization/Deserialization per item

I'm designing a car catalogue and need to use XML files for storage. In previous projecs, I was manually editing XML files with Linq. However, I came across XML serialization and am thinking this might be a better approach. Each item in the catalogue would be of type CarItem and contain various attributes. The catalogue can contain a few hundred cars and I'm worried about performance. If I deserialize the XML file, will all the CarItems be instantiated straight away? Is there a way for me to be able to choose which object gets deserialized based on parameters? For example, I'd like to say "if car color attribute is red, then only deserialize red CarItems into objects".
Thanks for any suggestions

There's quite a few posts with good examples of how you can control what you pull out and instantiate into objects/scalars using XDocument.
Shawn Oster's post in this thread I believe is quite close to what you want using linq. You can add where clauses to suit your requirements easily.

Yes, they all will be instantiated. However, few hundred objects is not a big deal for a class with some simple fields. Give it a try and check performance.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.