I'm wanting to make several modifications to a Roslyn syntax tree at once, all around the same area of code
tree = tree.ReplaceNodes(oldNode, newNode).RemoveNode(toRemove);
however, only the first modification succeeds. It seems that the first change changes all the nodes around it, so the RemoveNodes method no longer finds toRemove in the resulting tree. I really, really, don't want to re-do the work to re-calculate toRemove in the new tree, and using a single SyntaxRewriter to perform all the work (overriding the DefaultVisit method) is ridiculously slow.
How can I do what I want?
Before I offer a few alternatives, your comment that a SyntaxRewriter is "ridiculously slow" is a bit surprising. When you say "slow" do you mean "it's a lot of code to write" or "it's performing terribly"? That is the fastest (execution time wise) way to do multiple replacements, and both ReplaceNodes and RemoveNode use a rewriter internally. If you were having performance problems, make sure when you implement your DefaultVisit that you only visit child types if the nodes you're interested in are under the node it's called on. The simple trick is to compare spans and make sure the span of the node passed intersects with the nodes you are processing.
Anyways, SyntaxAnnotations provide a useful way to locate nodes in trees after a modification. You can just create an instance of the type, and attach it to a node with the WithAdditionalAnnotations extension method. You can locate the node again with the GetAnnotatedNodesOrTokens method.
So one way to approach your problem is to annotate your toRemove, and then when you call ReplaceNodes do two replacements in the same call -- one to do the oldNode -> newNode replacement and then one to do the toRemove -> toRemoveWithAnnotation replacement. Then find the annotated node in the resulting tree and call RemoveNode.
If you know that oldNode and toRemove aren't ancestors of each other (i.e. they're in unrelated parts of the tree), another option would be to reverse the ordering. Grab the parent node (call it oldNodeParent) of toRemove and call RemoveNode, meaning you get an updated parent node (call it oldNodeParentRewritten). Then, call ReplaceNodes doing two replacements: oldNode -> newNode and oldNodeParent -> oldNodeParentRewritten. No annotations needed.
Related
I've got a bit of a dilemma. I have to do some sorting of a list. There are 2 lists that users can select from and then select one of the elements in that list to sort on. Unfortunately for me, the second list is a child list, within the first list.
The child list will require slightly different logic other than if the user just chose from the parent list. I have the logic down to sort on either the parent and/or the child list using LINQ so I'm not too worried about the logic of it.
There is also the option to choose on ascending or descending order to make matter worse, at least for me. I've gone through the logic and it looks as though there will be a total of 64 if/else statements that I will need to use in order to incorporate all the scenarios.
My first reaction was that this wasn't a very good way to go about it as this seems like a lot of if/else statements. Is there a better way to go about this or do I just need to bite the bullet:
Logic for Parent:
Positions.Select(x => x.Product).OrderBy(x=>x.Price).AsQueryable();
Logic for Child:
Positions.Select(x => x.Product).OrderBy(x=>x.Performance.OrderBy(c=>c.AssortmentCategory).Select(c=>c.AssortmentCategory).FirstOrDefault()).AsQueryable();
Positions and Performance are both ListExtended's, if that matters. Also, I'm using the Dynamic LINQ library as I will be getting user input, though that is not shown above.
Edit1: I forgot to say this sorts only the parent list. Even if they choose an element from the child list, it will sort the parent list.
I would have a lookup dictionary that maps selection from first list (string, int ,? not sure what type you need) to a Func<> and do
var sortedData = sortDict[selected](args....)
ImmutableSortedSet has an IndexOf and I can enumerate the set with a normal for-loop.
But according to the source code for ImmutableSortedSet it looks like it have to do a tree-search for every index lookup. If I have thousands of items in the set this probably won't be very efficient (and very unnecessary).
Can I somehow get an enumerator from a certain index and forwards? I mean one that just traverses the internal node tree more efficiently?
Perhaps some way of hacking the enumerator with reflection? I kind of want it to fast forward to a certain node and the continue the enumeration from that.
I have attempted to manipulate the "_reverse"-field in the enumerator to first "guide" it to the first node. But I couldn't get the field-change to stick.
Another idea is to populate the node-stack initially as if it already were halfway inside the enumeration, at the point were I want it to start. Just don't know if it is possible.
I am working with C# and now trying to improve an algorithm (different story there), and to do that I need to have this data structure:
As you can see it is a linked list, where each node can have zero or one "follower"(the right ones). I am still thinking if more than one is necessary.
I could implement these linked lists by myself "raw" but I am thinking it would be much better if I use a collection from the ones available (such as List etc).
So far I am thinking of building a class "PairClass" which will have the a "first element" and a "follower". (the left node and right node). This could change if I decide to include more than one linked nodes(followers). Then using a List<PairClass>
One final consideration is that it would be nice if the data collection permits me to get the follower by giving the first element in an efficient manner.
Due to this last consideration, I am not sure if List<PairClass> would be the best approach.
Can someone advice me on what to use in these cases? I am always open to learn and discuss better ways of doing things. Basically I am asking for an efficient solution to the problem
EDIT: (in response to the comments)
How do you identify each node, is there an ID? or will the index in a list suffice?
So far, I am content with using just simple integers. But I guess you are right, you just give me an idea and perhaps the solution I need is simpler than I thought!
What are your use cases? How often will you be adding or removing elements? Are you going to iterate over this collection?
I will be adding elements often. The "follower" would likely be replaced often too. The elements are not going to be removed. I am going to iterate over this collection in the end, the reason being that followers are going to be eliminated as elements of consideration and replaced by their first element
(Note aside). The reason I am doing this is because I need to modify an algorithm that is taking too much time, This algorithm performs too many scans on an image (which takes time) so I plan to build this structure to solve the problem, therefore speed is a consideration.
You really need to add more details, however by your description
If you don't need to iterate over the list in order
If you have a key for each node
If you want fast lookups
You could use a Dictionary<Key,Node>
Node
public class Node
{
// examples
public string Id {get;set;}
public Node Parent {get;set;}
public Node Child {get;set;}
public Node Sibling {get;set;}
}
Option 1
var nodes = new Dictionary<string,Node>();
// add like this
nodes.Add(node.Id,node);
// look up like this
node = nodes[key];
// access its relatives
node.Parent
node.Child
Node.Sibling
If you want to iterate over the list often
If the index is all you need to look up the node
Or if you want to query the list via Linq
Option 2
var list = new List<Node>;
// lookup via index
var node = list[index];
// lookup via Linq
var node = list.FirstOrDefault(x => x.Id == someId)
In case it is a single follower scenario then I would suggest dictionary of list as a possible candidate as dictionary will make it accessible faster vertically and being a single follower list you can easily use a link list.
In case it is a multiple follower scenario I would suggest dictionary of dictionary collection which will make whole collection faster to access both vertically or horizontally.
Saruman gave a fairly good example of implementation.
I'm making a jquery clone for C#. Right now I've got it set up so that every method is an extension method on IEnumerable<HtmlNode> so it works well with existing projects that are already using HtmlAgilityPack. I thought I could get away without preserving state... however, then I noticed jQuery has two methods .andSelf and .end which "pop" the most recently matched elements off an internal stack. I can mimic this functionality if I change my class so that it always operates on SharpQuery objects instead of enumerables, but there's still a problem.
With JavaScript, you're given the Html document automatically, but when working in C# you have to explicitly load it, and you could use more than one document if you wanted. It appears that when you call $('xxx') you're essentially creating a new jQuery object and starting fresh with an empty stack. In C#, you wouldn't want to do that, because you don't want to reload/refetch the document from the web. So instead, you load it once either into a SharpQuery object, or into an list of HtmlNodes (you just need the DocumentNode to get started).
In the jQuery docs, they give this example
$('ul.first').find('.foo')
.css('background-color', 'red')
.end().find('.bar')
.css('background-color', 'green')
.end();
I don't have an initializer method because I can't overload the () operator, so you just start with sq.Find() instead, which operates on the root of the document, essentially doing the same thing. But then people are going to try and write sq.Find() on one line, and then sq.Find() somewhere down the road, and (rightfully) expect it to operate on the root of the document again... but if I'm maintaining state, then you've just modified the context after the first call.
So... how should I design my API? Do I add another Init method that all queries should begin with that resets the stack (but then how do I force them to start with that?), or add a Reset() that they have to call at the end of their line? Do I overload the [] instead and tell them to start with that? Do I say "forget it, no one uses those state-preserved functions anyway?"
Basically, how would you like that jQuery example to be written in C#?
sq["ul.first"].Find(".foo") ...
Downfalls: Abuses the [] property.
sq.Init("ul.first").Find(".foo") ...
Downfalls: Nothing really forces the programmer to start with Init, unless I add some weird "initialized" mechanism; user might try starting with .Find and not get the result he was expecting. Also, Init and Find are pretty much identical anyway, except the former resets the stack too.
sq.Find("ul.first").Find(".foo") ... .ClearStack()
Downfalls: programmer may forget to clear the stack.
Can't do it.
end() not implemented.
Use two different objects.
Perhaps use HtmlDocument as the base that all queries should begin with, and then every method thereafter returns a SharpQuery object that can be chained. That way the HtmlDocument always maintains the initial state, but the SharpQuery objects may have different states. This unfortunately means I have to implement a bunch of stuff twice (once for HtmlDocument, once for the SharpQuery object).
new SharpQuery(sq).Find("ul.first").Find(".foo") ...
The constructor copies a reference to the document, but resets the stack.
I think the major stumbling block you're running into here is that you're trying to get away with just having one SharpQuery object for each document. That's not how jQuery works; in general, jQuery objects are immutable. When you call a method that changes the set of elements (like find or end or add), it doesn't alter the existing object, but returns a new one:
var theBody = $('body');
// $('body')[0] is the <body>
theBody.find('div').text('This is a div');
// $('body')[0] is still the <body>
(see the documentation of end for more info)
SharpQuery should operate the same way. Once you create a SharpQuery object with a document, method calls should return new SharpQuery objects, referencing a different set of elements of the same document. For instance:
var sq = SharpQuery.Load(new Uri("http://api.jquery.com/category/selectors/"));
var header = sq.Find("h1"); // doesn't change sq
var allTheLinks = sq.Find(".title-link") // all .title-link in the whole document; also doesn't change sq
var someOfTheLinks = header.Find(".title-link"); // just the .title-link in the <h1>; again, doesn't change sq or header
The benefits of this approach are several. Because sq, header, allTheLinks, etc. are all the same class, you only have one implementation of each method. Yet each of these objects references the same document, so you don't have multiple copies of each node, and changes to the nodes are reflected in every SharpQuery object on that document (e.g. after allTheLinks.text("foo"), someOfTheLinks.text() == "foo".).
Implementing end and the other stack-based manipulations also becomes easy. As each method creates a new, filtered SharpQuery object from another, it retains a reference to that parent object (allTheLinks to header, header to sq). Then end is as simple as returning a new SharpQuery containing the same elements as the parent, like:
public SharpQuery end()
{
return new SharpQuery(this.parent.GetAllElements());
}
(or however your syntax shakes out.)
I think this approach will get you the most jQuery-like behavior, with a fairly easy implementation. I'll definitely be keeping an eye on this project; it's a great idea.
I would lean towards a variant on option 2. In jQuery $() is a function call. C# doesn't have global functions, a static function call is the closest. I would use a method that indicates you're creating a wrapper like..
SharpQuery.Create("ul.first").Find(".foo")
I wouldn't be concerned about shortening SharpQuery to sq since intellisense means users won't have to type the whole thing (and if they have resharper they only need to type SQ anyways).
I have an object graph wherein each child object contains a property that refers back to its parent. Are there any good strategies for ignoring the parent references in order to avoid infinite recursion? I have considered adding a special [Parent] attribute to these properties or using a special naming convention, but perhaps there is a better way.
If the loops can be generalised (you can have any number of elements making up the loop), you can keep track of objects you've seen already in a HashSet and stop if the object is already in the set when you visit it. Or add a flag to the objects which you set when you visit it (but you then have to go back & unset all the flags when you're done, and the graph can only be traversed by a single thread at a time).
Alternatively, if the loops will only be back to the parent, you can keep a reference to the parent and not loop on properties that refer back to it.
For simplicity, if you know the parent reference will have a certain name, you could just not loop on that property :)
What a coincidence; this is the topic of my blog this coming Monday. See it for more details. Until then, here's some code to give you an idea of how to do this:
static IEnumerable<T> Traversal<T>(
T item,
Func<T, IEnumerable<T>> children)
{
var seen = new HashSet<T>();
var stack = new Stack<T>();
seen.Add(item);
stack.Push(item);
yield return item;
while(stack.Count > 0)
{
T current = stack.Pop();
foreach(T newItem in children(current))
{
if (!seen.Contains(newItem))
{
seen.Add(newItem);
stack.Push(newItem);
yield return newItem;
}
}
}
}
The method takes two things: an item, and a relation that produces the set of everything that is adjacent to the item. It produces a depth-first traversal of the transitive and reflexive closure of the adjacency relation on the item. Let the number of items in the graph be n, and the maximum depth be 1 <= d <= n, assuming the branching factor is not bounded. This algorithm uses an explicit stack rather than recursion because (1) recursion in this case turns what should be an O(n) algorithm into O(nd), which is then something between O(n) and O(n^2), and (2) excessive recursion can blow the stack if the d is more than a few hundred nodes.
Note that the peak memory usage of this algorithm is of course O(n + d) = O(n).
So, for example:
foreach(Node node in Traversal(myGraph.Root, n => n.Children))
Console.WriteLine(node.Name);
Make sense?
If you're doing a graph traversal, you can have a "visited" flag on each node. This ensures that you don't revisit a node and possibly get stuck in an infinite loop. I believe this is the standard way of performing a graph traversal.
This is a common problem, but the best approach depends on the scenario. An additional problem is that in many cases it isn't a problem visiting the same object twice - that doesn't imply recursion - for example, consider the tree:
A
=> B
=> C
=> D
=> C
This may be valid (think XmlSerializer, which would simply write the C instance out twice), so it is often necessary to push/pop objects on a stack to check for true recursion. The last time I implemented a "visitor", I kept a "depth" counter, and only enabled the stack checking beyond a certain threshold - that means that most trees simply end up doing some ++/--, but nothing more expensive. You can see the approach I took here.
I'm not exactly sure what you are trying to do here but you could just maintain a hashtable with all previously visited nodes when you are doing your breadth first search of depth first search.
I published a post explaining in detail with code examples how to do object traversal by recursive reflection and also detect and avoid recursive references to prevent a stack over flow exception: https://doguarslan.wordpress.com/2016/10/03/object-graph-traversal-by-recursive-reflection/
In that example I did a depth first traversal using recursive reflection and I maintained a HashSet of visited nodes for reference types. One thing to be careful is to initialize your HashSet with your custom equality comparer which uses the object reference for hash calculation, basically the GetHashCode() method implemented by the base object class itself and not any overloaded versions of GetHashCode() because if the types of properties you traverse overload GetHashCode method, you may detect false hash collisions and think that you detected a recursive reference which in reality could be that the overloaded version of GetHashCode producing the same hash value via some heuristics and confusing the HashSet, all you need to detect is to check if there is any parent child in anywhere in the object tree pointing to the same location in memory.