C# Cloning graph and updating circular references - c#

Let me preface this by stating that I have seen similar posts to this, but none of the solutions have satisfied me and/or applied to C#.
I have a Graph class that consists of Node and Connection objects. The graph contains collections consisting of all of the child Node and Connection objects associated with it. In addition to this, each Node has a collection of Connection objects.
Please note: This is a simplified toy problem. You can view the actual (work-in-progress) production code here. In production, a Neuron is a Node and an Axon is a Connection.
public class Graph : IDeepCloneable<Graph>
{
// These are basically just Dictionary<int,object>s wrapped in an ICollection
public NodeCollection Nodes;
public ConnectionCollection Connections;
public Graph Clone()
{
return new Graph
{
Nodes = this.Nodes.Clone(),
Connections = this.Connections.Clone()
};
}
}
public class Node : IDeepCloneable<Node>
{
public int Id;
public NodeConnectionCollection Connections;
// NodeConnectionCollection is more or less the same as NodeCollection
// except that it stores Connection objects into '.Incoming' and '.Outgoing' properties
public Node Clone()
{
return new Node
{
Id = this.Id,
Connections = this.Connections.Clone()
};
}
}
public class Connection : IDeepCloneable<Connection>
{
public int Id;
public Node From;
public Node To;
public Connection Clone()
{
return new Connection
{
Id = this.Id,
From = this.From.Clone(),
To = this.To.Clone()
};
}
}
public class ConnectionCollection : ICollection<Connection>, IDeepCloneable<ConnectionCollection>
{
private Dictionary<int, Connection> idLookup;
private Dictionary<ProjectionKey, Connection> projectionLookup;
public int Count => idLookup.Count;
public bool IsReadOnly => false;
public Add( Connection conn )
{
idLookup.Add( conn.Id, conn );
projectionLookup.Add( new ProjectionKey( conn.From, conn.To ), conn );
}
...
internal struct ProjectionKey
{
readonly intFrom;
readonly int To;
readonly int HashCode;
public ProjectionKey( int from, int to )
{
From = from;
To = to;
HashCode = ( 23 * 397 + from ) * 397 + to;
}
public override int GetHashCode() { return HashCode; }
}
}
public class NodeCollection : ICollection<Node>, IDeepCloneable<NodeCollection>
{
private Dictionary<int, Node> nodes;
private Dictionary<int, InputNode> inputNodes;
private Dictionary<int, InnerNode> innerNodes;
private Dictionary<int, OutputNode> outputNodes;
...
public Node this[ int id ]
{
get => nodes[ id ];
}
}
Each of these objects support deep cloning, with the main idea being that consuming classes can call Clone() on child classes and work down the stack that way.
However, this is not viable in production. A call to Graph.Clone() will clone the NodeCollection and ConnectionCollection fields, which will clone each Node and Connection instance stored within them, which will each clone other referencing child elements.
A common solution seems to be storing the Ids of each child object and then rebuilding the references when all cloning is complete. However, as far as I am aware, this would require a reference to parent objects and tightly couple the data structure.
I am very puzzled at how to properly approach this. I require a reasonable amount of performance, as my application (a genetic algorithm) performs cloning constantly, but in this case I am more interested in finding a robust design pattern or implementation that will allow me to perform deep cloning of this graph structure while stashing a lot of the gruntwork behind the scenes.
Is there any design pattern that will allow me to clone this data structure as-is while updating circular references and maintaining its integrity?

My suggestion would be to change your approach to the problem from cloning to recreating. I've dealt with a resembling problem, where I was saving a graph user created manually from the user interface, and then upon an import of saved graph I was recreating it. It sounds almost the same if you think about it.
So the solution I came up with was serializing the graph from a central control (considering you are modifying graphs with an heuristic I assume you have central control over the graph). Even if you don't have a central control over the graph I believe it can be traversed in a way to get all the information.
In the simplest form a graph is a collection of neighborhood information.
Can be directed or undirected as well
1 -> 2
1 -> 3
3 -> 2
So if you can come up with a way to generate a list like this, after just tweaking this simple list, you can create your new graph.
Or another approach would be to list your nodes with their neighbors like below,
1, [2,3]
3, [2]
This would even be simpler to recreate the graph in my opinion.
Here is the file from the project I applied this approach if you are curious about - I don't think it would be a reference for the answer or question though.

Related

Pagination Newtonsoft.Json

A problem I've been running into recently is dealing with pagination of Json I receive from the server. I can work around some instances but would like a better approach. So the structure I receive here illustrates a problem I can run into:
Modules
{
ID
Title
Description
Lessons
{
edges
{
node
{
ID
}
}
}
}
So for the lessons array, the server inserts the edges and node elements because of the use of pagination. So what I would expect instead is:
Modules
{
ID
Title
Description
Lessons
{
ID
}
}
The main problem however with this is that it stops me being able to deserialize the object easily, i.e. I can't do this:
Modules[] modules = JsonConvert.DeserializeObject<Modules[]>(json, settings);
My Lesson and Module class for reference is just:
public class Lesson
{
public int ID;
}
[System.Serializable]
public class Module
{
public string ID;
public string Title;
public string Description;
public Lesson[] Lessons;
}
So just wondered if anyone else had come across a similar issue and what solutions they've done to work around it?
Your serialization should work just fine if you create concrete objects for 'edges' and 'node'.
I assume 'lessons' it an array of 'edge' object. 'edge' object contains the 'node' object, which has the property 'id'
So I worked out a bit of a workaround (sort of following Zero Cool's answer) which is to make generic Edges and Node classes. i.e.
[System.Serializable]
public class Edges<T>
{
public Node<T>[] edges;
}
public class Node<T>
{
public T node;
}
And then declare them where I am using Pagination. It's not lovely but works alright as I usually know what is/isn't paginated.
Would still be interested if there's a cleaner way.

Defensive Code to prevent Infinite Recursion in Parent/Child Hierarchy

Given an Object as Such
public class Thing
{
public Thing() { this.children = new List<Thing>();}
public int Id {get; set;}
public string Name {get; set;}
public List<Thing> children{ get; set;}
public string ToString(int level = 0)
{
//Level is added purely to add a visual hierarchy
var sb = new StringBuilder();
sb.Append(new String('-',level));
sb.AppendLine($"id:{Id} Name:{Name}");
foreach(var child in children)
{
sb.Append(child.ToString(level + 1));
}
return sb.ToString();
}
}
and if used (abused!?) in such a way
public static void Main()
{
var root = new Thing{Id = 1,Name = "Thing1"};
var thing2 = new Thing{Id = 2,Name = "Thing2"};
var thing3 = new Thing{Id = 3,Name = "Thing3"};
root.children.Add(thing2);
thing2.children.Add(thing3);
thing3.children.Add(root); //problem is here
Console.WriteLine(root.ToString());
}
how does one be defensive about this kind of scenario.
This code as it stands produces a stackoverflow, infinite recursion, or memory exceeded error.
In a (IIS) website this was causing the w3 worker processes to crash, and eventually the app pool to shut down (Rapid-Fail Protection)
The code above is indicative only to reproduce the problem. In the actual scenario, the structure is coming from a database with Id and ParentId.
Database table structure similar to
CREATE TABLE Thing(
Id INT NOT NULL PRIMARY KEY,
Name NVARCHAR(255) NOT NULL,
ParentThingId INT NULL //References self
)
The issue is that the creation of the 'things' by users is not preventing a incestuous relationship (i.e. a Parent could have children (who could have children etc.... that one eventually points at the parent again). One could put a constraint on the db to prevent the thing not being its own parent (makes sense), but depending on depth this could get ugly, and there is some argument that a circular reference may be required (we are still debating this....)
So arguably the structures can be circular, but if you want to render this kind of structure on a web page say as a <ul><li><a> tag kind of thing in a parent/child menu, how does one become proactive about dealing with this user generated data issue in code?
.NET fiddle here
One way would be to include a collection of visited nodes in the recursive call. If visited before you are in a cycle.
public string ToString(int level = 0, HashSet<int> visited)
{
foreach(var child in children)
{
if(visited.Add(child.Id))
sb.Append(child.ToString(level + 1, visited));
else
//Handle the case when a cycle is detected.
}
return sb.ToString();
}
You can unfold the tree structure by putting each element on a stack or queue and popping items of there while the collection has items. In the while loop you put the children of each item on the queue.
If you care about the level of the item in the tree you need can use a helper object that stores that.
Edit:
While unfolding the tree you can put each item on a new list and use that as reference for circular problems.
If you can a) eliminate that possibility of wanting to have circular references and b) guarantee that all children are already known of when that parent is created, its a great opportunity to make children an immutable collection that's only set via the constructor.
That gives you a class that, by structural recursion, you know cannot contain any loops, no matter how big the overall structure is. Something like:
public sealed class Thing
{
public Thing(IEnumerable<Thing> children) {
this._children = children.ToList().AsReadOnly();
}
private readonly ReadOnlyCollection<Thing> _children;
public int Id {get; set;}
public string Name {get; set;}
public IEnumerable<Thing> children {
get {
return _children;
}
}
public string ToString(int level = 0)
{
//Level is added purely to add a visual hierarchy
var sb = new StringBuilder();
sb.Append(new String('-',level));
sb.AppendLine($"id:{Id} Name:{Name}");
foreach(var child in children)
{
sb.Append(child.ToString(level + 1));
}
return sb.ToString();
}
}
Now, of course, those conditions I have stated above are quite big "if"s, so you need to consider whether it's a good fit for you.

Find orphaned elements within a hierarchy

I am struggling to find a good solution for this. It's fairly straight forward to find the orphaned elements, but the trouble is storing them in such a way that they can easily be merged back into the hierarchy at a later point.
I the following abstract class that has multiple implementations:
public abstract class FilterElement
{
public abstract string ID { get; }
public abstract IEnumerable<FilterElement> Children { get; set; }
public FilterElement Parent { get; set; }
}
I have two hierarchies of FilterElement - the "master" (i.e. the main structure), and the "filters". The filters point at elements in the master - however, if these master elements do not exist, I wish to create a third structure, the "orphans".
I'm struggling to do this. While it's easy to identify the orphaned elements, I don't know how to store them effectively. This is the current solution:
Note: "GetFlatKey" returns a unique key for the element based on it's parents & children, and "RecursiveSelect" effectively flattens the hierarchy:
private IEnumerable<FilterElement> GetOrphanedFilterElements
(IEnumerable<FilterElement> filters,
IEnumerable<IFilterFileViewModel> visibleList)
{
var flattenedMasterList = visibleList.Cast<IFilterViewModel>()
.RecursiveSelect(f => f.Children)
.Select(x => x.GetFlatKey).ToList();
var orphanedFilterFiles = new List<FilterElement>();
foreach (var f in filters.RecursiveSelect(f => f.Children))
{
// Remove non orphaned files.
if (!flattenedMasterList.Contains(f.GetFlatKey))
{
orphanedFilterFiles.Add((f));
}
}
return orphanedFilterFiles;
}
The problem with this is that the elements in the orphanedFilterFiles list contain references to other elements - e.g. An orphan will have a parent, which may have non-orphaned Children. This makes it difficult to merge back into the final hierarchy, which is the main issue.
Can anyone help me find a better solution, or just tell me what I'm doing wrong?

object linking consistency over network with binary serialisation

I'm asking for more of an explanation more than anything about c# objects.
[Serializable]
class ExampleSub
{
public Example parent;
public ExampleSub(Example parent)
{
this.parent= parent;
}
}
[Serializable]
class Example
{
List<ExampleSub> subs;
public Example()
{
for (int i = 0; i < 10; i++)
subs.Add(new ExampleSub(this));
}
}
If I was to binary-serialise the Example class, with it containing the ExampleSubs in the list. Then sent it over a TCP connection, where on the other end it was deserialised back into an Example object. Would the ExampleSub's parent object point at the new Example object (as it should).
My question mostly is revolving around how the serialiser correctly maps the objects together, but also about how c# objects are identified at all really since the closest I can guess is that they act like smart-pointers.

Objects containing list of same object type

Is there anything wrong with defining something like this:
class ObjectA
{
property a;
property b;
List <ObjectA> c;
...
}
No, and because the answer needs at least 30 characters, I'll add that this is a common pattern.
Since you included the oop tag, though, I'll add that this pattern gives a lot of control to the outside world. If c is a list of children, for example, you're giving everyone who has access to an instance of ObjectA the ability to add, delete, or replace its children.
A tighter approach would be to use some sort of read-only type (perhaps implementing IList<ObjectA>) to expose the children.
EDIT
Note that the following still allows others to modify your list:
class ObjectA
{
property a;
property b;
List <ObjectA> c;
...
public List<ObjectA> Children { get { return c; } }
}
The absence of a setter only prevents outsiders from replacing the list object.
Nope. That's perfectly acceptable. Tree structures do this.
It is perfectly valid. For example, you would have to do something like this to build a tree data structure (parent node contains a list of child nodes).
i have to ask if your question is about putting a List< > in there, or if it is about putting a List< ObjectA > inside of ObjectA. and the answer to both questions is "Yes"!
the thing to keep in mind is that by default, the access is private. if you want other classes to use this list, then you need to add a few things to your class...
class ObjectA
{
property a;
property b;
List <ObjectA> c;
// allow access, but not assignment
// you can still modify the list from outside, you just cant
// assign a new list from outside the class
public List<ObjectA> somePropertyName{ get { return this.c;}}
// same as above, only allow derived child classes to set the list
public List<ObjectA> somePropertyName{ get { return this.c;}
protected set { this.c = value;} }
// allow all access
public List<ObjectA> somePropertyName{ get { return this.c;}
set { this.c = value;} }
}
No. This is valid. Many structures uses this graph like pattern.
If you eg have a base collection class
namespace MiniGraphLibrary
{
public class GraphCollection
{
public Node Root { set; get; }
public Node FindChild(Node root)
{
throw new NotImplementedException();
}
public Node InsertNode(Node root, Node nodeToBeInserted)
{
throw new NotImplementedException();
}
}
}
Then you can have the node act like this:
namespace MiniGraphLibrary
{
public class Node
{
private string _info;
private List<Node> _children = new List<Node>();
public Node(Node parent, string info)
{
this._info = info;
this.Parent = parent;
}
public Node Parent { get; set; }
public void AddChild(Node node)
{
if (!this.DoesNodeContainChild(node))
{
node.Parent = this;
_children.Add(node);
}
}
public bool DoesNodeContainChild(Node child)
{
return _children.Contains(child);
}
}
}
Note that this is something I wrote in 2 minutes, and it is problery not good in production, but the 2 main things is that you have a parent node and many children. When you add a child node to a given node, then you make sure that it has its parent node set. Here I first check if the child is allready in the children list before connection the two.
You could make some changes to the code, and make sure that if a child is removed an parent lists that it is allready connected to. I have not done this there.
I have made this to illustrate how it could be used. And it is used many places. Fx clustered indexes in MSSQL uses some sort of this tree like representation. But I am NOT an expert on this subject, so correct me if I am wrong.
I have not implemented the two classes in the GraphCollection class. The downside of my little example is that you if you are going to implement the Find method, then you have to go through the whole graph. You could make a binary tree that only has two children:
namespace MiniTreeLibrary
{
public class SimpleNode
{
private string _info;
private SimpleNode _left;
private SimpleNode _right;
private SimpleNode _parent;
public SimpleNode(Node parent, string info)
{
this._info = info;
this.Parent = parent;
}
public Node Parent { get; private set; }
}
}
I have omitted the insertion of the right and left. Now with this binary tree you could do some pretty darn fast searching, if you wanted!! But that is another discossion.
There is many rules when it comes trees and graphs, and my graph is even a real graph. But I have put these examples here so you can see that it is used alot!! If you want to go more into linear and other data structures, then see this serie of articles. Part 3, 4 and 5 they talks alot more about trees and graphs.

Categories