I am working with C# and now trying to improve an algorithm (different story there), and to do that I need to have this data structure:
As you can see it is a linked list, where each node can have zero or one "follower"(the right ones). I am still thinking if more than one is necessary.
I could implement these linked lists by myself "raw" but I am thinking it would be much better if I use a collection from the ones available (such as List etc).
So far I am thinking of building a class "PairClass" which will have the a "first element" and a "follower". (the left node and right node). This could change if I decide to include more than one linked nodes(followers). Then using a List<PairClass>
One final consideration is that it would be nice if the data collection permits me to get the follower by giving the first element in an efficient manner.
Due to this last consideration, I am not sure if List<PairClass> would be the best approach.
Can someone advice me on what to use in these cases? I am always open to learn and discuss better ways of doing things. Basically I am asking for an efficient solution to the problem
EDIT: (in response to the comments)
How do you identify each node, is there an ID? or will the index in a list suffice?
So far, I am content with using just simple integers. But I guess you are right, you just give me an idea and perhaps the solution I need is simpler than I thought!
What are your use cases? How often will you be adding or removing elements? Are you going to iterate over this collection?
I will be adding elements often. The "follower" would likely be replaced often too. The elements are not going to be removed. I am going to iterate over this collection in the end, the reason being that followers are going to be eliminated as elements of consideration and replaced by their first element
(Note aside). The reason I am doing this is because I need to modify an algorithm that is taking too much time, This algorithm performs too many scans on an image (which takes time) so I plan to build this structure to solve the problem, therefore speed is a consideration.
You really need to add more details, however by your description
If you don't need to iterate over the list in order
If you have a key for each node
If you want fast lookups
You could use a Dictionary<Key,Node>
Node
public class Node
{
// examples
public string Id {get;set;}
public Node Parent {get;set;}
public Node Child {get;set;}
public Node Sibling {get;set;}
}
Option 1
var nodes = new Dictionary<string,Node>();
// add like this
nodes.Add(node.Id,node);
// look up like this
node = nodes[key];
// access its relatives
node.Parent
node.Child
Node.Sibling
If you want to iterate over the list often
If the index is all you need to look up the node
Or if you want to query the list via Linq
Option 2
var list = new List<Node>;
// lookup via index
var node = list[index];
// lookup via Linq
var node = list.FirstOrDefault(x => x.Id == someId)
In case it is a single follower scenario then I would suggest dictionary of list as a possible candidate as dictionary will make it accessible faster vertically and being a single follower list you can easily use a link list.
In case it is a multiple follower scenario I would suggest dictionary of dictionary collection which will make whole collection faster to access both vertically or horizontally.
Saruman gave a fairly good example of implementation.
Related
I've got a bit of a dilemma. I have to do some sorting of a list. There are 2 lists that users can select from and then select one of the elements in that list to sort on. Unfortunately for me, the second list is a child list, within the first list.
The child list will require slightly different logic other than if the user just chose from the parent list. I have the logic down to sort on either the parent and/or the child list using LINQ so I'm not too worried about the logic of it.
There is also the option to choose on ascending or descending order to make matter worse, at least for me. I've gone through the logic and it looks as though there will be a total of 64 if/else statements that I will need to use in order to incorporate all the scenarios.
My first reaction was that this wasn't a very good way to go about it as this seems like a lot of if/else statements. Is there a better way to go about this or do I just need to bite the bullet:
Logic for Parent:
Positions.Select(x => x.Product).OrderBy(x=>x.Price).AsQueryable();
Logic for Child:
Positions.Select(x => x.Product).OrderBy(x=>x.Performance.OrderBy(c=>c.AssortmentCategory).Select(c=>c.AssortmentCategory).FirstOrDefault()).AsQueryable();
Positions and Performance are both ListExtended's, if that matters. Also, I'm using the Dynamic LINQ library as I will be getting user input, though that is not shown above.
Edit1: I forgot to say this sorts only the parent list. Even if they choose an element from the child list, it will sort the parent list.
I would have a lookup dictionary that maps selection from first list (string, int ,? not sure what type you need) to a Func<> and do
var sortedData = sortDict[selected](args....)
I've read on here that iterating though a dictionary is generally considered abusing the data structure and to use something else.
However, I'm having trouble coming up with a better way to accomplish what I'm trying to do.
When a tag is scanned I use its ID as the key and the value is a list of zones it was seen in. About every second I check to see if a tag in my dictionary has been seen in two or more zones and if it has, queue it up for some calculations.
for (int i = 0; i < TagReads.Count; i++)
{
var tag = TagReads.ElementAt(i).Value;
if (tag.ZoneReads.Count > 1)
{
Report.Tags.Enqueue(tag);
Boolean success = false;
do
{
success = TagReads.TryRemove(tag.Epc, out TagInfo outTag);
} while (!success);
}
}
I feel like a dictionary is the correct choice here because there can be many tags to look up but something about this code nags me as being poor.
As far as efficiency goes. The speed is fine for now in our small scale test environment but I don't have a good way to find out how it will work on a massive scale until it is put to use, hence my concern.
I believe that there's an alternative approach which doesn't involve iterating a big dictionary.
First of all, you need to create a HashSet<T> of tags on which you'll store those tags that have been detected in more than 2 zones. We'll call it tagsDetectedInMoreThanTwoZones.
And you may refactor your code flow as follows:
A. Whenever you detect a tag in one zone...
Add the tag and the zone to the main dictionary.
Create an exclusive lock against tagsDetectedInMoreThanTwoZones to avoid undesired behaviors in B..
Check if the key has more than one zone. If this is true, add it to tagsDetectedInMoreThanTwoZones.
Release the lock against tagsDetectedInMoreThanTwoZones.
B. Whenever you need to process a tag which has been detected in more than one zone...
Create an exclusive lock against tagsDetectedInMoreThanTwoZones to avoid more than a thread trying to process them.
Iterate tagsDetectedInTwoOrMoreZones.
Use each tag in tagsDetectedInMoreThanTwoZones to get the zones in your current dictionary.
Clear tagsDetectedInMoreThanTwoZones.
Release the exclusive lock against tagsDetectedInMoreThanTwoZones.
Now you'll iterate those tags that you already know that have been detected in more than a zone!
In the long run, you can even make per-region partitions so you never get a tagsDetectedInMoreThanTwoZones set with too many items to iterate, and each set could be consumed by a dedicated thread!
If you are going to do a lot of lookup in your code and only sometimes iterate through the whole thing, then I think the dictionary use is ok. I would like to point out thought that your use of ElementAt is more alarming. ElementAt performs very poorly when used on objects that do not implement IList<T> and the dictionary does not. For IEnumerables<T> that do not implement IList the way the nth element is found is through iteration, so your for-loop will iterate the dictionary once for each element. You are better off with a standard foreach.
I feel like this is a good use for a dictionary, giving you good access speed when you want to check if an ID is already in the collection.
This is an algorithmic question.
I have got Dictionary<object,Queue<object>>. Each queue contains one or more elements in it. I want to remove all queues with only one element from the dictionary. What is the fastest way to do it?
Pseudo-code: foreach(item in dict) if(item.Length==1) dict.Remove(item);
It is easy to do it in a loop (not foreach, of course), but I'd like to know which approach is the fastest one here.
Why I want it: I use that dictionary to find duplicate elements in a large set of objects. The Key in dictionary is kind of a hash of the object, the Value is a queue of all objects found with the same hash. Since I want only duplicates, I need to remove all items with just a single object in associated queue.
Update:
It may be important to know that in a regular case there are just a few duplicates in a large set of objects. Let's assume 1% or less. So possibly it could be faster to leave the Dictionary as is and create a new one from scatch with just selected elements from the first one... and then deelte the first Dictionary completely. I think it depends on the comlpexity of computational Dictionary class's methods used in particular algorithms.
I really want to see this problem on a theoretical level because as a teacher I want to discuss it with students. I didn't provide any concrete solution myself because I think it is really easy to do it. The question is which approach is the best, the fastest.
var itemsWithOneEntry = dict.Where(x => x.Value.Count == 1)
.Select(x => x.Key)
.ToList();
foreach (var item in itemsWithOneEntry) {
dict.Remove(item));
}
It stead of trying to optimize the traversing of the collection how about optimizing the content of the collection so that it only includes the duplicates? This would require changing your collection algorithm instead to something like this
var duplicates = new Dictionary<object,Queue<object>>;
var possibleDuplicates = new Dictionary<object,object>();
foreach(var item in original){
if(possibleDuplicates.ContainsKey(item)){
duplicates.Add(item, new Queue<object>{possibleDuplicates[item],item});
possibleDuplicates.Remove(item);
} else if(duplicates.ContainsKey(item)){
duplicates[item].Add(item);
} else {
possibleDuplicates.Add(item);
}
}
Note that you should probably measure the impact of this on the performance in a realistic scenario before you bother to make your code any more complex than it really needs to be. Most imagined performance problems are not in fact the real cause of slow code.
But supposing you do find that you could get a speed advantage by avoiding a linear search for queues of length 1, you could solve this problem with a technique called indexing.
As well as your dictionary containing all the queues, you maintain an index container (probably another dictionary) that only contains the queues of length 1, so when you need them they are already available separately.
To do this, you need to enhance all the operations that modify the length of the queue, so that they have the side-effect of updating the index container.
One way to do it is to define a class ObservableQueue. This would be a thin wrapper around Queue except it also has a ContentsChanged event that fires when the number of items in the queue changes. Use ObservableQueue everywhere instead of the plain Queue.
Then when you create a new queue, enlist on its ContentsChanged event a handler that checks to see if the queue only has one item. Based on this you can either insert or remove it from the index container.
I was recently profiling an application trying to work out why certain operations were extremely slow. One of the classes in my application is a collection based on LinkedList. Here's a basic outline, showing just a couple of methods and some fluff removed:
public class LinkInfoCollection : PropertyNotificationObject, IEnumerable<LinkInfo>
{
private LinkedList<LinkInfo> _items;
public LinkInfoCollection()
{
_items = new LinkedList<LinkInfo>();
}
public void Add(LinkInfo item)
{
_items.AddLast(item);
}
public LinkInfo this[Guid id]
{ get { return _items.SingleOrDefault(i => i.Id == id); } }
}
The collection is used to store hyperlinks (represented by the LinkInfo class) in a single list. However, each hyperlink also has a list of hyperlinks which point to it, and a list of hyperlinks which it points to. Basically, it's a navigation map of a website. As this means you can having infinite recursion when links go back to each other, I implemented this as a linked list - as I understand it, it means for every hyperlink, no matter how many times it is referenced by another hyperlink, there is only ever one copy of the object.
The ID property in the above example is a GUID.
With that long winded description out the way, my problem is simple - according to the profiler, when constructing this map for a fairly small website, the indexer referred to above is called no less than 27906 times. Which is an extraordinary amount. I still need to work out if it's really necessary to be called that many times, but at the same time, I would like to know if there's a more efficient way of doing the indexer as this is the primary bottleneck identified by the profiler (also assuming it isn't lying!). I still needed the linked list behaviour as I certainly don't want more than one copy of these hyperlinks floating around killing my memory, but I also do need to be able to access them by a unique key.
Does anyone have any advice to offer on improving the performance of this indexer. I also have another indexer which uses a URI rather than a GUID, but this is less problematic as the building incoming/outgoing links is done by GUID.
Thanks;
Richard Moss
You should use a Dictionary<Guid, LinkInfo>.
You don't need to use LinkedList in order to have only one copy of each LinkInfo in memory. Remember that LinkInfo is a managed reference type, and so you can place it in any collection, and it'll just be a reference to the object that gets placed in the list, not a copy of the object itself.
That said, I'd implement the LinkInfo class as containing two lists of Guids: one for the things this links to, one for the things linking to this. I'd have just one Dictionary<Guid, LinkInfo> to store all the links. Dictionary is a very fast lookup, I think that'll help with your performance.
The fact that this[] is getting called 27,000 times doesn't seem like a big deal to me, but what's making it show up in your profiler is probably the SingleOrDefault call on the LinkedList. Linked lists are best for situations where you need fast insertions & removals, particularly in the middle of the list. For quick lookups, which is probably more important here, let the Dictionary do its work with hash tables.
I need to find a path or paths down a complicated graph structure. The graph is built using something similar to this:
class Node
{
public string Value { get; set;}
public List<Node> Nodes { get; set;}
public Node()
{
Nodes = new List<Node>();
}
}
What makes this complicated is that the nodes can reference back to an earlier node. For example,
A -> C -> E -> A
What I need to do is get a list of stacks which represent paths through the Nodes until I get to a Node with a specific value. Since its possible there can be some very large paths available we can have a maximum Nodes to try.
List<Stack<Node>> paths = FindPaths(string ValueToFind, int MaxNumberNodes);
Does anyone have a way to build this (or something similar)? I've done recursion in the past but I'm having a total brain fart thinking about this for some reason. My question specified a lambda expression but using a lambda is not necessarily required. I'd be grateful for any solution.
Side note: I lifted the class from aku's excellent answer for this recursion question. While his elegant solution shown below traverses the tree structure it doesn't seem to allow enough flexibility to do what I need (for example, dismiss paths that are circular and track paths that are successful).
Action<Node> traverse = null;
traverse = (n) => { Console.WriteLine(n.Value); n.Nodes.ForEach(traverse);};
traverse(root); // where root is the tree structure
Edit:
Based on input from the comments and answers below I found an excellent solution over in CodeProject. It uses the A* path finding algorithm. Here is the link.
If you're issue is related to Pathfinding, you may want to google for "A star" or "A*".
Its a common and efficient pathfinding algorithm. See this article for an example directly related to your problem.
You may also want to look at the Dijsktra Algorithm
I'm not sure whether your intended output is all paths to the goal, the best path to the goal (by some metric, e.g. path length), or just any path to the goal.
Assuming the latter, I'd start with the recursive strategy, including tracking of visited nodes as outlined by Brann, and make these changes:
Add parameters to represent the goal being sought, the collection of successful paths, and the current path from the start.
When entering a node that matches the goal, add the current path (plus the current node) to the list of successful paths.
Extend the current path with the current node to create the path passed on any recursive calls.
Invoke the initial ExploreGraph call with an empty path and an empty list of successful paths.
Upon completion, your algorithm will have traversed the entire graph, and distinct paths to the goal will have been captured.
That's just a quick sketch, but you should be able to flesh it out for your specific needs.
I don't know exactly what you want to achieve, but this circular reference problem is usually solved by tagging already visited nodes.
Just use a Dictionnary to keep track of the nodes which have already been visited so that you don't loop.
Example :
public void ExploreGraph(TreeNode tn, Dictionary<TreeNode, bool> visitednodes)
{
foreach (Treenode childnode in tn.Nodes)
{
if (!visitedNodes.ContainsKey(childnode))
{
visitednodes.Add(childnode);
ExploreGraph(childnode, visitednodes);
}
}
}