We are using the classes in the Microsoft.VisualStudio.XmlEditor namespace (https://msdn.microsoft.com/en-us/library/microsoft.visualstudio.xmleditor.aspx) to parse an xml document in an Visual Studio Extension. Documents are parsed to an object structure, and only changes are parsed. This part is working perfectly.
Now we want to use the parsed data objects to show adornments (https://msdn.microsoft.com/en-us/library/microsoft.visualstudio.text.editor.intratextadornmenttag.aspx) on specific locations within the XML file, based on some conditions. The tagging system expects that the extension gives a list of snapshotspans, where the tags should be rendered.
The problem is that we cannot find a way to know the correct positions of the XML tags. The XmlModel class, provides a GetTextSpan() method (https://msdn.microsoft.com/en-us/library/microsoft.visualstudio.xmleditor.xmlmodel.gettextspan.aspx), but the result of that method call doesn't contain the ITextSnapshot to which the coordinates belong. Since the XML parser is running on a separate thread, there is no way to guarantee that these coordinates are still valid for the most recent snapshot of the text buffer when doing changes quickly after each other. If the result would have been a SnapshotSpan struct (https://msdn.microsoft.com/en-us/library/microsoft.visualstudio.text.snapshotspan.aspx), we would be able to translate that span to the current snapshot.
Does anybody know a proper way to resolve this issue?
Related
I'm writing a script editor that needs to transform the user's input (which is similar to script syntax) into valid C# according to some rules I define. For example, if the users puts in
using System;
public string hello()
{
return "Hi!" // whoops, semicolon here is missing!
}
I'd need to transform this to
using System;
public class ContainerClass
{
public string hello()
{
return "Hi!" // whoops, semicolon here is missing!
}
}
my transformation will insert new nodes (such as the class declaration) and might move around existing ones, but it will never modify or remove existing ones. (I know SourceCodeKind = Script does something vaguely similar, but I can't use that for a variety of reasons).
Now I need to come up with a way to do this transformation given the following considerations:
Since I need to run the transformation each time the user changes the original document (i.e. just types a single letter), I can't afford to re-parse the entire thing every time from a performance perspective. For example, if the user inserts the missing semicolon after ";", ideally I would just insert the same (or cloned) node into my already transformed document, instead of re-parsing everything. I suppose that rules out standard ways of modification such as DocumentEditor.
I need to have a way to re-map locations from my transformed document to locations in the original document. Since I will never delete nodes, I think theoretically this should be possible (but how?).
This is necessary e.g. as I would end up with diagnostic messages (and intellisense information etc.) pointing to locations in the transformed document, and need to get the original document's location for these to actually show them to the user.
Can anyone thing of a more or less direct way to do this? Is there maybe even some Roslyn helper classes for use cases like this?
My ideas below. I'm not quite sure they'd work, and I think they'd be very hard to implement, so I'm hoping there is some easier way to be honest;
For #1, my only idea was to get get the text changes (Document.GetTextChangesAsync) of the original document after its source code changes; and then somehow try to find out what nodes have been affected by this (maybe get nodes that intersect the edited area in the old and new document, then compute which ones have been deleted, added or modified) - and then to apply these changes in my transformed document. This seems awfully complex though.
For #2, my only idea so far was to enable tracking for nodes of the original document. Then I would find whatever node a location points to in the transformed document, and find the node in the original document this originated from (and then find the location of that node).
But the problem is that e.g. the code above would produce a diagnostic error pointing towards the location right after "Hi!", wich the location span's length of 0, as there's a semicolon missing. So the location doesn't really point to a node at all. Maybe I could try finding adjacent nodes in that case?!
I have two documents that I try to merge partially.
I take some parts from document A that are inside a RTF content control and copy all child elements of the sdtContent of the content control to another sdtContent in document B using AppendChild and CloneNode(true) (deep cloning):
foreach (var srcChildElement in sourceDocumentSdtContentBlockNode.ChildElements)
{
targetSdtContentBlock.AppendChild(srcChildElement.CloneNode(true);
}
The problem is, that if the content (list items) in my source document A is formatted with bullet points the result in document B will be numbered list items.
Why isn't the resulting style like in the source when I clone it? I thought it would just keep its style as I don't manipulate it. In document B there seems also no to be a formatting issue. I can manually insert bullets or a numbered list without a problem.
I even created two completly new word documents and the same thing happend, so it is surely not some issue with the existing files.
Update:
I found out that there is a separate numbering definition in a document (https://msdn.microsoft.com/en-us/library/office/ee922775(v=office.14).aspx) which is not part of the node I copy. When I clone the node this information is not included at it is in a separate numbering definition. Is there a way to copy a node to a new document and tell it to retain the numbering styles from the numbering definition? The other option would be to manually check if the cloned node contains a numPr element and if yes to extract also the definition (including changing and reassigning the IDs) - would prefer a less complex way to copy an element and hope there is one :-)
Any ideas?
Well, I did manage to solve this. As already mentioned the Numbering information is stored in a separate part (or file inside the document zip):
https://msdn.microsoft.com/en-us/library/office/ee922775(v=office.14).aspx
From the NumberingDefinitions (in NumberingPart) of the source I copied the necessary (where there is a numPr inside a paragraph). To the target file by lookup up the numId of the numPr.
I cloned the num element and also the corresponding abstractNum element to the target. Doing this I also replaced the IDs (incremented all by 1 using the existing element IDs in the target) like numId, abstractNumId and nsid (hex value) to make sure there won't be any conflicts.
Was a bit work but very doable.
I think my problem is general enough to ask the question, but specific enough so nobody asked it yet - or I wasn't able to find it.
The task is the following: An application's configuration is stored inside an XML document. The application since then evolved and the current configuration is stored inside a different XML. The new XML is different in structure as well, so simply copying the old config will break the application.
Our goal is simple: We want to move the configuration changes from the old XML document into the new one. We have access to:
Non-modified old XML
Modified old XML
Non-modified new XML
From this information above I have to create the new XML document, where the modifications inside the old xml are re-applied.
I've figured out the algorithm already, but I don't want to reinvent the wheel here. The high level algorithm is the following:
Compare each node inside the non-modified XML with the modified XML.
If the node is not modified move forward.
If a node is removed copy it's xpath location to the removed-nodes collection.
If a node is added move it's xpath location and it's content into the added-nodes collection.
If a node is modified, move the modified atomic content / attribute to the modified-nodes collection
If a node is not atomic, apply the above algorithm to every descendant node. (recursive algorithm)
Open the new XML, go to each modification location, and
Remove the deleted nodes
Add the added nodes
Apply the modifications
The xml layout of the collected modification would look like something like this:
<xml-diff>
<added-node location="/xpath/to/the/node">
<content-of-the-added-node with="attributes">
<and-sub-elements-as-well/>
</content-of-the-added-node>
</added-node>
<removed-node location="/xpath/to/the/node"/>
<modified-node location="/xpath/to/the/node">
<content-of-the-modified-node>modified-atomic-content</content-of-the-modified-node>
</modified-node>
<added-attribute location="/xpath/to/the/#attribute">value</added-attribute>
<removed-attribute location="/xpath/to/the/#attribute" />
<modified-attribute location="/xpath/to/the/#attribute">new-value</added-attribute>
</xml-diff>
content-of-the-added-node and content-of-the-modified-node are both nodes inside the modified old xml.
After locating the modifications inside the old xml, the task is pretty straight forward. I can re-apply the modifications inside the new xml where the xpath doesn't changed. I also have a mapping which describes what old xpath value has been changed to what new xpath value, thus applying the configuration changes correctly. (For example /root/node1 has moved to /root/collections/node1, etc.)
I know that XSLT is used for transforming one XML to another. The tricky part here is to detect what are the modifications - transformations are. Sadly processing XML is a bit tricky since the order of the nodes is not always kept, but it still can mean the same thing nonetheless.
My questions are:
Is XSLT the right path to approach this problem, or should I use something else?
If XSLT is, what is the right transformation algorithm to detect these changes recursively?
If XSLT isn't the answer what is?
Can you provide me a simple XSLT where I can begin my work with?
Please note that I'm totally new in XSLT. I'm familiar with XML, and have some basic understanding of XSD.
I'm kind of stuck having to use .Net 2.0, so LINQ xml isn't available, although I would be interested how it would compare...
I had to write an internal program to download, extract, and compare some large XML files (about 10 megs each) that are essentially build configurations. I first attempted using libraries, such as Microsoft's XML diff/patch, but comparing the files was taking 2-3 minutes, even with ignoring whitespace, namespaces, etc. (i tested each ignore one at a time to try and figure out what was speediest). The I tried to implement my own ideas - lists of nodes from XmlDocument objects, dictionaries of keys of the root's direct descendants (45000 children, by the way) that pointed to ints to indicate the node position in the XML document... all took at least 2 minutes to run.
My final implementation finishes in 1-2 seconds - I made a system process call to diff with a few lines of context and saved those results to display (our development machines include cygwin, thank goodness).
I can't help but think there is a better, XML specific way to do this that would be just as fast as a plain text diff - especially since all I'm really interested in is the Name element that is the child of each direct descendant, and could throw away 4/5 of the file for my purposes (we only need to know what files were included, not anything else involving language or version)
So, as popular as XML is, I'm sure somebody out there has had to do something similar. What is a fast efficient way to compare these large XML's? (prefereably open source or Free)
edit: a sample of the nodes - I only need to find missing Name elements (there are over 45k nodes as well)
<file>
<name>SomeFile</name>
<version>10.234</version>
<countries>CA,US</countries>
<languages>EN</languages>
<types>blah blah</types>
<internal>N</internal>
</file>
XmlDocument source = new XmlDocument();
source.Load("source.xml");
Dictionary<string, XmlNode> files = new Dictionary<string, XmlNode>();
foreach(XmlNode file in source.SelectNodes("//file"))
files.Add(file.SelectSingleNode("./name").InnerText, file);
XmlDocument source2 = new XmlDocument();
source2.Load("source2.xml");
XmlNode value;
foreach(XmlNode file in source2.SelectNodes("//file"))
if (files.TryGetValue(file.SelectSingleNode("./name").InnerText, out value))
// This file is both in source and source2.
else
// This file is only in source2.
I am not sure exactly what you want, I hope that this example will help you in your quest.
Diffing XML can be done many ways. You're not being very specific regarding the details, though. What does transpire is that the files are large and you need only 4/5 of the information.
Well, then the algorithm is as follows:
Normalize and reduce the documents to the information that matters.
Save the results.
Compare the results.
And the implementation:
Use the XmlReader API, which is efficient, to produce plain text representations of your information. Why plain text representation? Because diff tools predicated on the assumption that there is plain text. And so are our eyeballs. Why XmlReader? You could use SAX, which is memory-efficient, but XmlReader is more efficient. As for the precise spec of that plain text file ... you're just not including enough information.
Save the plain text files to some temp directory.
Use a command-line diff utility like GnuWin32 diff to get some diff output. Yeah, I know, not pure and proper, but works out of the box and there's no coding to be done. If you are familiar with some C# diff API (I am not), well, then use that API instead, of course.
Delete the temp files. (Or optionally keep them if you're going to reuse them.)
I'm trying to sort through a collection of DeepZoom sub-images based on arbitrary data associated with each image. The sub-images get loaded automagically through an XML file generated by DeepZoom Composer. I don't see a clear way to associate arbitrary data with a DeepZoom sub-image.
The solutions that seem most obvious to me are brittle and don't scale well. Ideally, I'd like to put the relevant data in the generated XML file, but I'd lose that information on the next set of generated images.
Is there a well-established way of accomplishing this goal?
As you've noticed DeepZoomComposer supports a <Tag></Tag> element which you can use in your Silverlight MultiScaleImage control (filtering by tag example).
You are also right that you would 'lose' any information you add to the XML file when you edit in DeepZoomComposer and re-generate (however you don't lose it if you typed into DeepZoomComposer).
To get around this problem, I've written a little console application called TagUpdater -- basically it works like this:
You put your metadata IN THE IMAGES: JPG file format supports lots of different fields, but for now let's use Title, Keywords (tags), Description and Rating
You add your images to Microsoft's DeepZoomComposer (don't necessarily bother laying them out, since you will probably want to sort them dynamically; and don't bother entering any metadata) and Export as normal
Call TagUpdater.exe Metadata.xml via the console (DeepZoomComposer will have generated the Metadata.xml file).
TagUpdater extracts the metadata direct from your images and updates Metadata.xml (see below). It is destructive to the existing <Tag> data, but otherwise the file can be used as-before to provide metadata information for a DeepZoom collection in a MultiScaleImage control.
<Image>
<FileName>C:\Documents and Settings\xxxxxx\My Documents\Expression\Deep Zoom Composer Projects\Bhutan\source images\page01.jpg</FileName>
<x>0</x>
<y>0</y>
<Width>0.241254523522316</Width>
<Height>0.27256162721473</Height>
<ZOrder>1</ZOrder>
<Tag>Bhutan,Mask</Tag>
<Description>Land of the Thunder Dragon</Description>
<Title>Bhutan 2008</Title>
<Rating>3</Rating>
</Image>
You can keep adding images/regenerating because the metadata is coming from the images (not the DeepZoomComposer tag box).
Here's an example - notice how the tags and description on the right are updated as you hover over each image, as well as the visible images being filtered based on clicking a tag.
Kirupa's source can be modified to use this extra data...
string tagString = g.Element("Tag").Value;
// get new elements as well
string descriptionString = g.Element("Description").Value;
string titleString = g.Element("Title").Value;
string ratingString = g.Element("Rating").Value;
Hope that's of some interest - TagUpdater itself isn't the only way to accomplish this. It's pretty simple: it just opens the Metadata.XML file, loops through the <Image> elements to open the <FileName>, extract the metadata, add the additional XML elements and save the XML. Using the filename as a 'key' you could get additional information from a database (eg. a price or more description data) and expand the XML file as much as you want.
Metadata.xml has a Tag property that can be associated with each image. Hurray!