I have a recursion function that builds a node list from an IEnumerable of about 2000 records. The procedure currently takes around 9 seconds to complete and has become a major performance issue. The function serves to:
a) sort the nodes hierarchically
b) calculate the depth of each node
This is a stripped down example:
public class Node
{
public string Id { get; set; }
public string ParentId { get; set; }
public int Depth { get; set; }
}
private void GetSortedList()
{
// next line pulls the nodes from the DB, not included here to simplify the example
IEnumerable<Node> ie = GetNodes();
var l = new List<Node>();
foreach (Node n in ie)
{
if (string.IsNullOrWhiteSpace(n.ParentId))
{
n.Depth = 1;
l.Add(n);
AddChildNodes(n, l, ie);
}
}
}
private void AddChildNodes(Node parent, List<Node> newNodeList, IEnumerable<Node> ie)
{
foreach (Node n in ie)
{
if (!string.IsNullOrWhiteSpace(n.ParentId) && n.ParentId == parent.Id)
{
n.Depth = parent.Depth + 1;
newNodeList.Add(n);
AddChildNodes(n, newNodeList, ie);
}
}
}
What would be the best way to rewrite this to maximize performance? I've experimented with the yield keyword but I'm not sure that will get me the result I am looking for. I've also read about using a stack but none of the examples I have found use parent IDs (they use child node lists instead), so I am a little confused on how to approach it.
Recursion is not what is causing your performance problem. The real problem is that on each recursive call to AddChildNodes, you traverse the entire list to find the children of the current parent, so your algorithm ends up being O(n^2).
To get around this, you can create a dictionary that, for each node Id, gives a list of all its children. This can be done in a single pass of the list. Then, you can start with the root Id ("") and recursively visit each of its children (i.e. a "depth first traversal"). This will visit each node exactly once. So the entire algorithm is O(n). Code is shown below.
After calling GetSortedList, the sorted result is in result. Note that you could make children and result local variables in GetSortedList and pass them as parameters to DepthFirstTraversal, if you prefer. But that unnecessarily slows down the recursive calls, since those two parameters would always have the same values on each recursive call.
You can get rid of the recursion using stacks, but the performance gain would probably not be worth it.
Dictionary<string, List<Node>> children = null;
List<Node> result = null;
private void GetSortedList()
{
var ie = data;
children = new Dictionary<string,List<Node>>();
// construct the dictionary
foreach (var n in ie)
{
if (!children.ContainsKey(n.ParentId))
{
children[n.ParentId] = new List<Node>();
}
children[n.ParentId].Add(n);
}
// Depth first traversal
result = new List<Node>();
DepthFirstTraversal("", 1);
if (result.Count() != ie.Count())
{
// If there are cycles, some nodes cannot be reached from the root,
// and therefore will not be contained in the result.
throw new Exception("Original list of nodes contains cycles");
}
}
private void DepthFirstTraversal(string parentId, int depth)
{
if (children.ContainsKey(parentId))
{
foreach (var child in children[parentId])
{
child.Depth = depth;
result.Add(child);
DepthFirstTraversal(child.Id, depth + 1);
}
}
}
Related
I have such a class for implementation of Sorted Linked List in C#. Now it's not actually sorted, but what changes do I have to make to this method it become one?
class LinkedSortedList<T>
{
public Node<T> Head { get; set; }
public Node<T> Tail { get; set; }
public int Count { get; set; }
public LinkedSortedList()
{
Head = null;
Tail = null;
Count = 0;
}
public LinkedSortedList(T data)
{
CreateList(data);
}
public void AddElement(T data)
{
if (Tail != null)
{
var node = new Node<T>(data);
Tail.Next = node;
Tail = node;
Count++;
}
else
{
CreateList(data);
}
}
public void CreateList(T data)
{
var node = new Node<T>(data);
Head = node;
Tail = node;
Count = 1;
}
I want to modify AddElement function so that the list is and remains sorted. How can I implement this logic?
A key observation you need to make in order to complete the task is that at the beginning of AddElement the list is either empty, or sorted. If the list is empty, your task is trivial; if the list is sorted, you must pick an insertion point for the element being added, and insert the new element there. The list will remain sorted after the insertion, because no other elements would need to be moved.
To find the insertion point, start walking the list from the head until you either (1) find an element that is greater than the one being inserted, or (2) you reach the end of the list. In both cases you simply insert the new element immediately after the last element that you've passed during the traversal, or at the head if the initial element is greater than the one you are inserting.
Currently you're just adding elements to the end of the list. Instead, we want to walk the list and look at each value until we find the correct place to insert the node.
One way to do this is to do the following checks:
Is the Head null? If so, create a new list with this data and we're done.
Is the new node's data less than the Head data? If so, put this node in front of the Head and point the Head to our new node.
Is the new node's data greater than the Tail data? If so, put this node after the Tail and point the Tail to our new node.
Otherwise, create a "current" node that points to the head, compare the data of the current node with our new node, and continue to move through the list by setting "current" to it's Next property until we find the proper place to insert the new node. Then rearrange the Next and Previous properties of the three nodes (the new node, the one before it, and the one after it).
For example:
public void AddElement(T data)
{
if (Head == null)
{
CreateList(data);
}
else
{
var node = new Node<T>(data);
if (node.Data < Head.Data)
{
Head.Previous = node;
node.Next = Head;
Head = node;
}
else if (node.Data > Tail.Data)
{
Tail.Next = node;
node.Previous = Tail;
Tail = node;
}
else
{
var current = Head;
while(node.Data >= current.Data && current.Next != null)
{
current = current.Next;
}
node.Previous = current.Previous;
if (node.Previous != null) node.Previous.Next = node;
current.Previous = node;
node.Next = current;
if (Head == current) Head = node;
Count++;
}
}
}
I'm looking for some advice on a good way to achieve my goal. I feel like my pseudo code below is redundant and that there must be a more efficient way of doing it. My question is there a better solutioj to this or more efficcient way? So here is the setup...
I have a class called Node which has two properties
class Node
{
bool favorite
string name
}
I have a list which contains around a thousand of these Nodes.
I want to give users three features..
A way to filter the list to just show favorites otherwise if favorites is false it displays the original list
Ability to search by string/name comparison
The ability for both the search and favorite to work in combination
below is my pseudo code - describes an approach, not ideal though. You can read the comments in the code to get the main gist.
// initial collection of nodes
list<Nodes> initialnodesList = [];
// list of nodes which are displayed in UI
list<Nodes> displayNodes = [];
public void FilterNodes()
{
list<Nodes> tempNodesList = [];
if (favoritesEnabled)
{
// collect favorites
foreach (n in initialnodesList)
if (n.favorite)
tempNodesList.add(n);
// search within favorites if needed and create new list
list<Nodes> searchedNodesList = [];
if (!isStringNullWhiteSpace(searchString))
{
foreach (n in tempNodesList)
if (n.name == searchString)
searchedNodesList.add(n);
displayNodes = searchedNodesList;
return;
}else{
return;
}
}
else
{
// search within initial node collection if needed and create new list
list<Nodes> searchedNodesList = [];
if (!isStringNullWhiteSpace(searchString))
{
foreach (n in initialnodesList)
if (n.name == searchString)
searchedNodesList.add(n);
displayNodes = searchedNodesList;
return;
}
// if search is not needed and favorites were not enabled then just return the original node collection
displayNodes = initialnodesList;
return;
}
}
You can optimize your code with linq statement to filter based on searchString and favorite option.
public List<Node> FilterNodes(bool seachFavorite, string searchString)
{
return initialnodesList.Where(l => (string.IsNullOrEmpty(searchString) || l.name.StartWith(searchString, StringComparison.OrdinalIgnoreCase)) && l.favorite == seachFavorite).ToList();
}
Also, optimize your code to look for search with StartWith, you can changed to Contains if you want search has to be done based on contains string search.
I want to build a tree structure like this:
root Id 1
child id 2
grandChild id 3
Code sample below. If I use GetChildrenNodesCorrect(), I get the correct result. But when GetChildrenNodesWrong() is used, it returns below:
root Id 1
child id 2
Null
I know that ToList() is not deferred execution, and returns result immediatelly. Could anyone explain this?
public class ToListTest
{
public static void Entry()
{
var toListTest = new ToListTest();
toListTest.Test();
}
public void Test()
{
List<Node> newsList = new List<Node>
{
new Node{Id = 1, ParentId = 0},
new Node{Id = 2, ParentId = 1},
new Node{Id = 3, ParentId = 2}
};
var root = BuildUpTree(newsList);
}
private TreeNode BuildUpTree(List<Node> newsList)
{
var root = new TreeNode { currentNode = newsList.First(n => n.ParentId == 0) };
BuildUpTreeChildrenNodes(newsList, root);
return root;
}
private void BuildUpTreeChildrenNodes(List<Node> newsList, TreeNode currentTreeNode)
{
currentTreeNode.Children = GetChildrenNodesWrong(newsList, currentTreeNode);
foreach (var node in currentTreeNode.Children)
{
BuildUpTreeChildrenNodes(newsList, node);
}
}
private IEnumerable<TreeNode> GetChildrenNodesWrong(List<Node> newsList, TreeNode cuurentNode)
{
return newsList.Where(n => n.ParentId == cuurentNode.currentNode.Id)
.Select(n => new TreeNode
{
currentNode = n
});
}
private IEnumerable<TreeNode> GetChildrenNodesCorrect(List<Node> newsList, TreeNode cuurentNode)
{
return GetChildrenNodesWrong(newsList, cuurentNode).ToList();
}
public class TreeNode
{
public Node currentNode { get; set; }
public IEnumerable<TreeNode> Children { get; set; }
}
public class Node
{
public int Id { get; set; }
public int ParentId { get; set; }
}
}
Update
In debug, when using GetChildrenNodesWrong(), root has both child and grandchild before the method returns. After the method returns, root has only child, and grandchild is null.
Update 2
IMO, the problem might not be related to clean code. But anyone is welcome to show more intuitive code.
Every time the IEnumerable is evaluated the Linq query is re-executed. So, when you're computing the tree, it is allocating space for nodes but not assigning them to any permanent variable. This means that in the foreach loop in BuildUpTreeChildrenNodes, you are not calling the recursive function on the instance of the node you want. Instead, you're calling it on a re-instantiated version of the node that has been created by the foreach loop (which enumerates the IEnumerable). When you call ToList on the IEnumerable instead, then the foreach loop will return the elements of the list, which is in memory.
If you make root public static, and then debug your code, you'll see that when you call BuildUpTreeChildrenNodes, the node argument is not the instance of the node that you want. Even though it has the same ID and represents the same node in the graph, it is not actually connected in any way to the root node. Check:
root.Children.Any(n => n.Id == node.Id) //true
root.Children.Contains(node) //false
The simplest way to see your problem is here:
//Create a singleton Node list:
var nodeSingleton= Enumerable.Range(0, 1).Select(x => new Node { Id = x });
Console.Write(nodeSingleton.Single() == nodeSingleton.Single());
You might expect this to return true, but in fact it will be false - both times the Single Linq method is called, the deferred execution of the singleton variable is re-evaluated, and returns a different instance of the Node class.
If you call ToList on the singleton, however, then you get the list in memory and the Single method will return the same instance of the Node.
More broadly, I think the problem with this code is that it mixes up imperative and functional code too much. It is strange that so many of the methods are void and then the GetChildrenNodesWrong method is not. I think you should pick a style and stick with it, since switching paradigms can be confusing.
I'm not entirely sure what you're asking. All LINQ queries have deferred execution, when you call ToList() you're simply forcing the query to execute. I think the main problem is in your where clause. Only two objects satisfy the condition so the IEnumerable returned by your LINQ query should only have 2 objects.
It's not doing what you expect because the LINQ query in GetChildrenNodesWrong is producing an "off by one" error. Here is basically what happens;
1) we feed it root for n = root nothing happens. We move to the next node.
2) n.Id = 1, the where condition is met by node 2 as it's parentId is 1. We allocate a new node, point current to node 2
3) We get to the third node now. n.ParentId = 2 and current.Id = 2. We have a match so we allocated another node and point current to node 3.
4) we're at the end of the list. Grand child is never allocated because we're off by one.
Basically you iterate x time where x is the length of the list but because current = n on the first iteration you don't allocate a node so you end up with x -1 nodes when you're expecting x.
I am working on an ASP.Net page, and there is tree view in it. In the tree view some nodes have nested nodes like branches. I have data in a list of custom objects in the following format:
Id, Description, parentId
Right now, I am using a function to recursively add nodes to the tree view. The following is code snippet:
private bool findParentAddNode(string id, string description, string parentid, ref List<CustomTreeNode> treeList)
{
bool isFound = false;
foreach (CustomTreeNode node in treeList)
{
if (node.id == parentid)//if current node is parent node, add in it as its child
{
node.addChild(id, description, parentid);
isFound = true;
break;
}
else if (node.listOfChildNodes != null)//have child nodes
{
isFound = findParentAddNode(id, description, parentid, ref node.listOfChildNodes);
if (isFound)
break;
}
}
return isFound;
}
The above technique works well but, for more then 30K nodes, its performance is slow. Please suggest an algorithm to replace this recursive call with loops.
As it recurses down the tree, the code is doing a linear search over the lists of child nodes.
This means that for randomly distributed parent ids, after adding N nodes to the tree it will on average search N/2 nodes for the parent before adding the N+1th node. So the cost will be O(N²) on the number of nodes.
Instead of a linear scan, create an index of id to node and use that to find the parent quickly. When you create a node and add it to the tree, also add it to a Dictionary<int,CustomTreeNode>. When you want to add a node to parent, find the parent in the index and add it. If addChild returns the child it creates, then the code becomes:
Dictionary<int,CustomTreeNode> index = new Dictionary<int,CustomTreeNode>();
private bool findParentAddNode(string id, string description, string parentid)
{
if ( !nodeIndex.TryGetValue ( parentid, out parentNode ) )
return false;
index[id] = parentNode.addChild(id, description, parentid);
return true;
}
You will need to add the root of the tree to the index before using findParentAddNode.
An iterative version of a breadth-first search should be something like the following:
var rootNodes = new List<CustomTreeNode> { new CustomTreeNode() };
while (rootNodes.Count > 0) {
var nextRoots = new List<CustomTreeNode>();
foreach (var node in rootNodes) {
// process node here
nextRoots.AddRange(node.CustomTreeNode);
}
rootNodes = nextRoots;
}
That said, this isn't tested, and since it's a BFS, isn't optimal w/r/t memory. (Memory use is O(n), not O(log n) compared to DFS or iterative-deepening DFS.)
You can return data in xml format from sql server database using for xml
then bind it to treeview control.
I'm trying to populate hierarchical data in a .NET 2.0 (yes, 2.0) application and upgrading right now is off the table (so no LINQ, LINQ Bridge, or other things).
I was wondering if there is a better way to populate hierarchical data into this class structure? I'm pretty sure there is a far better way for this to be accomplished.
It would be really nice to see a good way to do this. If anyone has the time to show a .NET 2.0 way and if there is a different way they would do it in .NET 4.0+ that would be great.
Here is an example of the node type structure:
using System.Collections.Generic;
public class ExampleNode
{
private int _id;
private Nullable<int> _parentId;
private int _depth;
private List<ExampleNode> _children = new List<ExampleNode>();
public ExampleNode()
{
}
public virtual int ApplicationNumber {
get { return _id; }
set { _id = value; }
}
public virtual Nullable<int> ParentId {
get { return _parentId; }
set { _parentId = value; }
}
public virtual int Depth {
get { return _depth; }
set { _depth = value; }
}
public virtual List<ExampleNode> Children {
get { return _children; }
set { _children = value; }
}
}
Here is an example function that is being utilized to populate the node structure. It seems like it is not the best way to do this and it has the potential not to populate grandchildren type data. Depth comes back from the stored proc as the level in the hierarchy (items with a level of 0 are top level, if a node is the child of a top level node it is at level 1, a grandchild of a top level node is level 2, etc.)
public List<ExampleNode> GetNodes()
{
// This may not be optimal.
List<ExampleNode> nodeList = new List<ExampleNode>();
Dictionary<int, ExampleNode> nodeDictionary = new Dictionary<int, ExampleNode>();
using (SqlDataReader reader = SqlHelper.ExecuteReader(ConfigurationManager.ConnectionStrings("SqlServer").ConnectionString, CommandType.StoredProcedure, "proc_GetNodeStructure", new SqlParameter("#UserId", userId), new SqlParameter("#NodeTypeId", nodeType))) {
while (reader.Read) {
ExampleNode nodeInstance = new ExampleNode();
nodeInstance.Id = Convert.ToInt32(reader("Id"));
nodeInstance.Depth = Convert.ToInt32(reader("Depth"));
if (!Information.IsDBNull(reader("ParentId"))) {
nodeInstance.ParentId = Convert.ToInt64(reader("ParentId"));
}
// Add to list
nodeList.Add(nodeInstance);
// Add to dictionary
nodeDictionary.Add(nodeInstance.Id, nodeInstance);
}
}
foreach (ExampleNode item in nodeList) {
if (item.ParentId.HasValue) {
nodeDictionary(item.ParentId).Children.Add(item);
}
}
for (int i = nodeList.Count - 1; i >= 0; i += -1) {
if (nodeList(i).Depth > 0) {
nodeList.RemoveAt(i);
}
}
return nodeList;
}
If I understand correctly, you
gather the nodes into a list and a dictionary
iterate through the list and arrange parent/child relationships via the dictionary
remove nodes from the list that have a positive depth
... which leaves the list containing the top-most nodes in the hierarchical stucture. Your algorithm seems correct to me.
The first two operations are O(n) complexity in time and space with respect to the number of nodes, which is pretty good!
The only real inefficient thing you are doing is removing elements from the list in step 3. Because the underlying storage is a vector, removing an element from the front of the list is expensive, because all of the remaining elements need to be copied down. You are trying to minimize the amount of such copying by iterating over the list backward. Imagine that the last half of the list is parent nodes, and the front half is child nodes. Whenever you remove a child node, you will still have to copy half the original list size every time a child node is removed. This approaches O(n^2) behaviour.
So for step 3 you have at least two choices if you wish to improve performance in time:
Make a second list that contains only the elements from the first where the depth == 0.
Use a linked list instead, so that deletions are O(1) instead of up to O(n) performance.
Here is the code for the first option:
...
List<ExampleNode> roots = new List<ExampleNode>();
for (int i = 0; i < nodeList.Count; i ++) {
if (nodeList[i].Depth == 0) {
roots.Add(nodeList[i]);
}
}
return roots;
You could potentially save even a little more time by counting how many root nodes there are during step 1 or 2, and then initializing the second list so its capacity is equal to the number of root nodes. This will prevent unnecessary allocations and copying of the underlying list vector while you are adding elements to the list.
List<ExampleNode> roots = new List<ExampleNode>(rootCount);
The same applies to the first nodeList; you can delay its construction until you know the number of records returned from the query.
What about using NHibernate? It works with .net 2 plus, so you can move forward with it as well.