Programmatic XML Diff / Merge in C# - c#

At this moment, I am managing a piece of software that has multiple XML configuration files. When a new version of software is released, sometimes the base config files change, we currently have the software call KDiff on startup. If it detects a change, it prompts the user to choose the changes.
The problem with this approach is that KDiff is a line comparing program and not aware of the ways of XML (like Nodes, etc.)
Ideally, I would like to programmatically work with a library in C# (since we're a MS shop) that can Diff two XML files: a Source XML and a Current Working XML.
And then Merge the two together using a few simple rules:
If the Current Working XML has a node that the Source XML does not, remove it.
If the Source XML has a node that the Current Working XML does not, add it.
If both have the same node and the values differ, favor the Source XML's value, unless it the Source XML's value is set to "UseExistingValue".
For example, here's the "Source" XML:
<Configuration>
<Items>
<Item Id="1" Position="true">
<Location X="UseExistingValue" Y="UseExistingValue" Z="UseExistingValue" />
<Something/>
<SomethingElse/>
</Item>
</Items>
</Configuration>
And here's the "Current Working" XML:
<Configuration>
<Items>
<Item Id="1" Position="false">
<Location X="123" Y="234" Z="345" />
<Another/>
<Something/>
</Item>
</Items>
</Configuration>
And the merged version would look like:
<Configuration>
<Items>
<Item Id="1" Position="true">
<Location X="123" Y="234" Z="345" />
<Something/>
<SomethingElse/>
</Item>
</Items>
</Configuration>
I've looked at the MS XML Diff and Patch Tool and it definitely merges the files together, but doesn't allow for the programmatic rules that I want to define.
XMLUnit for Java devs seems promising, but the .NET version of it seems underdeveloped, which is unfortunate.
Anyone have any suggestions for either scriptable XML Diff/Merge tools and/or .NET libraries (paid or free)?
Thanks.

After a couple days of messing around, I found a solution that I think works for me. Maybe it could work for other people as well.
The MS XML Diff and Patch tool was a viable option. When you Diff first file against the second file it creates an XML "DiffGram" listing what changes it detected between the two XML files.
To take care of all 3 rules that I listed above, I Diff'd the two files in one direction, then opened the DiffGram file using Linq-to-XML and Removed all the "Add" and "Remove" lines.
XNamespace xd = "http://schemas.microsoft.com/xmltools/2002/xmldiff";
var doc = XDocument.Load(_diffGramFile);
doc.Root.DescendantsAndSelf(xd + "add").Remove();
doc.Root.DescendantsAndSelf(xd + "remove").Remove();
Then I patched up (merged) this edited diffgram against the first file and created a partially merged temporary file. This takes care of Rules 1 and 2.
Next, I Diff'd the partially merged file against the first file used. Then opened the new DiffGram and removed all Change references to "UseExistingValue".
var newdoc = XDocument.Load(_diffGramFile);
newdoc.Root.DescendantsAndSelf(xd + "change")
.Where(x => x.Value == "UseExistingValue").Remove();
And merged this edited DiffGram against the partially merged file which takes care of Rule 3. Saving this out to XML then produces the final XML merged according to the rules defined above.
Hopefully this can help out other people.
HINT: After installing the XmlDiffPatch library, the XmlDiffPatch DLL can be found in C:\Windows\assembly\GAC\XmlDiffPatch\1.0.8.28__b03f5f7f11d50a3a\XmlDiffPatch.dll

Related

Unexpected XML Declaration. XML Declaration must be the first node in the document and no whitespace characters are allowed to appear before it linit [duplicate]

This error,
The processing instruction target matching "[xX][mM][lL]" is not allowed
occurs whenever I run an XSLT page that begins as follows:
<?xml version="1.0" encoding="windows-1256"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:include href="../header.xsl"/>
<xsl:template match="/">
<xsl:call-template name="pstyle"/>
<xsl:call-template name="Validation"/>
<xsl:variable name="strLang">
<xsl:value-of select="//lang"/>
</xsl:variable>
<!-- ////////////// Page Title ///////////// -->
<title>
<xsl:value-of select="//ListStudentFinishedExam.Title"/>
</title>
Note: I removed any leading spaces before the first line, but the error still occurs!
Xerces-based tools will emit the following error
The processing instruction target matching "[xX][mM][lL]" is not allowed.
when an XML declaration is encountered anywhere other than at the top of an XML file.
This is a valid diagnostic message; other XML parsers should issue a similar error message in this situation.
To correct the problem, check the following possibilities:
Some blank space or other visible content exists before the <?xml ?>
declaration.
Resolution: remove blank space or any other
visible content before the XML declaration.
Some invisible content exists before the <?xml ?>
declaration. Most commonly this is a Byte Order Mark
(BOM).
Resolution:
Remove the BOM using techniques such as those suggested by the W3C
page on the BOM in HTML.
A stray <?xml ?> declaration exists within the XML content.
This can happen when XML files are combined programmatically or
via cut-and-paste. There can only be one <?xml ?> declaration
in an XML file, and it can only be at the top.
Resolution: Search for
<?xml in a case-insensitive manner, and remove all but the top XML
declaration from the file.
Debug your XML file. Either there is space or added extra or fewer tags.
For better understanding build the project through the command line. Windows: gradlew build
In my case, AndroidManifest.xml has a blank space at the very first line
<Empty Row> // This Creates the issue
<?xml version="1.0" encoding="utf-8"?>
There was auto generated Copyright message in XML and a blank line before <resources> tag, once I removed it my build was successful.
just remove this line: <?xml version="1.0" encoding="utf-8"?> because this kind of error only come because of this line or you might also check the format of your line according the mentioned line in this answer.
I had a similar issue with 50,000 rdf/xml files in 5,000 directories (the Project Gutenberg catalog file). I solved it with riot (in the jena distribution)
the directory is cache/epub/NN/nn.rdf (where NN is a number)
in the directory above the directory where all the files are, i.e. in cache
riot epub/*/*.rdf --output=turtle > allTurtle.ttl
This produces possibly many warnings but the result is in a format which can be loaded into jena (using the fuseki web interface).
surprisingly simple (at least in this case).
Another reason of the above error is corrupted jar file. I got the same error but for Junit when running unit tests. Removing jar and downloading it again fixed the issue.
in my case was a wrong path in a config file: file was not found (path was wrong) and it came out with this exception:
Error configuring from input stream. Initial cause was The processing
instruction target matching "[xX][mM][lL]" is not allowed.
For PHP, put this line of code before you start printing your XML:
while(ob_get_level()) ob_end_clean();
It's worth checking your server's folders to see if there's a stray pom.xml hanging around.
I found that I had the problem everyone else described with a malformed pom.xml, but in a folder that I didn't expect to be on the server. An old build was sticking around unwelcome D:
For my case, the tab is the trouble maker. Replace the tab with blank should resolve the issue

How to search through XML to find bad nodes

I have a large XML file (68Mb), I am using SQL Server Business Intelligence Studio 2008 to extract the XML data into a database. There is an error in the XML file some where that prevents it from executing. Possibly a missing tag or something like that. The file is so large I cant manually sort through it looking for the error.
Below is a sample of the the XML schema used.
How can I use XPath to sort through the XML in VS 2012 using C#?
An example would be great!
-<PhoneNumberList>
<PhoneNumber value="1234567890" type="Phone"/>
</PhoneNumberList>
-<YearsOfServiceList>
<YearsOfService experienceInMonths="24" description="SuperAdmin" objectCode="049"/>
</YearsOfServiceList>
</Person>
-<Person dob="1960-01-09T00:00:00" lastName="Smith" middleName="Will" firstName="John" id="9999-9999-9999">
-<SiteList>
-<Site id="2014" siteLongName="HA" siteCode="1255" systemCode="999">
-<StaffPositionList>
<StaffPosition id="73" staffPosition="Administrator"/>
</StaffPositionList>
</Site>
</SiteList>
-<ProgramList>
<Program id="1234" siteLongName="ABC" siteCode="0000" systemCode="205"/>
<Program id="5678" siteLongName="DEF" siteCode="0000" systemCode="357"/>
</ProgramList>
-<TypeList>
<Type Description="Leader" certificateType="D"/>
<Type Description="Professional" certificateType="P"/>
</TypeList>
-<EmailList>
<Email value="jsmith#somesite.com" type="Email"/>
</EmailList>
-<PhoneNumberList>
<PhoneNumber value="1234567890" type="Phone"/>
</PhoneNumberList>
-<YearsOfServiceList>
<YearsOfService experienceInMonths="24" description="SuperAdmin" objectCode="049"/>
</YearsOfServiceList>
</Person>
</PersonList>
</GetPersonDetail>
If you want to do it in code then create an XSD file describing a valid format for the data, embed it as a resource in your app and then use code like this
var errors = new List<string>();
var schemaSet = new XmlSchemaSet();
schemaSet.Add("", XmlReader.Create(new StringReader(Properties.Resources.NameOfXSDResource)));
document.Validate(schemaSet, (sender, args) =>
{
errors.Add(args.Message);
}
);
This will give you a list of validation errors.
You don't need to search "by hand" if you use a competent text editor. NotePad++'s XML plugin, for instance, can determine if your XML as a whole is well-formed or valid, and both instances will provide separate error messages.
If you don't have a schema and the file is well-formed, you can use the CLR's System.XML namespace to read in the document and then iterate through its nodes using LINQ-to-XML, which would allow you to very finely control which nodes go where. With LINQ, you could either create a new XML file with only the valid entries, procedurally correct the invalid entries as you determine where they are, or even just write to your SQL server database directly.
Your troubleshooting process should be something as follows:
Is the XML well-formed? I..e, does it comport to the fundamental rules of XML?
Is the XML valid? I.e., does it have the elements and attributes you expect?
Is your import query accurate?
For things like this I usually have luck checking and fixing the data in Notepad++. Install the XmlTools plugin and that has a menu for checking the xml syntax and tags.
Also, those dashes will give you problems, it's best to save out the xml file directly without copying by hand.
A 68MB XML file is no problem for XML editors such as XMLBlueprint 64-bit (http://www.xmlblueprint.com/) or Stylus Studio (http://www.stylusstudio.com/). Just check the well-formedness of your xml file (F7 in XMLBlueprint) and the editor will display the errors.

How do I read this xml File?

I have this xml file
<?xml version="1.0" encoding="utf-8" ?>
<parameters>
<parameters
registerLink="linkValue"
TextBox.name="nameValue"
/>
</parameters>
I want to print off "LinkValue" and "nameValue" by code:
Console.WriteLine("registerLink: " + registerLink);
Console.WriteLine("TextBox.name: " + TextBox.name);
Thanks
The easiest API is XLinq (System.Xml.Linq)
var doc = XDocument.Load(fileName);
// This should be parameters/parameter, i follow the question with parameters/parameters
var par = doc.Element("parameters").Element("parameters");
registerLink = par.Attribute("registerLink").Value; // string
Your could use an xml reader like this one
http://msdn.microsoft.com/en-us/library/cc189056%28v=vs.95%29.aspx
Once you have a working sample look here to find out how to open an xml reader from a file stream. File must be located in project directory
http://support.microsoft.com/kb/307548
Once you have that done you can add an open file dialog box to find any file on the computer and even validate the .xml extension and more.
Edit: As you can see in the comments below, Hanks solution is better, faster, and easier. My solution would only be useful if you have huge xml files with tons of data. You may still be interested in the file dialog box as well.

Check XML via XSD schemas which are specified in xsi:schemaLocation attribute

Sorry for my English.
C# 4.0, LINQ to XML.
I get XDocument from an XML file, for example:
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../../support/localization.xslt"?>
<doc:resources xmlns:doc="http://mea-orbis.com/2012/XMLSchema/localization"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://mea-orbis.com/2012/XMLSchema/localization ../../support/localization.xsd">
<!--Заголовки столбцов таблицы-->
<doc:record id="commandName">Команда</doc:record>
<doc:record id="commandNameDescript">Краткое описание</doc:record>
<doc:record id="commandNameNotes">Примечание</doc:record>
<!--******************************************-->
<!--Наименования групп команд-->
<doc:record id="group1">Команды смены кодировок</doc:record>
<!--******************************************-->
<!--Наименования команд, их краткое описание и примечания-->
<doc:record id="dwgconvertName">DWGCONVERT</doc:record>
<doc:record id="dwgconvertKeyWords">кодировка</doc:record>
<doc:record id="dwgconvertDescr">конвертация текущего чертежа (версии AutoCAD до 2011 включительно)</doc:record>
<doc:record id="dwgconvertcpName">DWGCONVERTCP</doc:record>
<doc:record id="dwgconvertcpKeyWords">кодировка</doc:record>
<doc:record id="dwgconvertcpDescr">конвертация текущего чертежа (версии AutoCAD с 2008)</doc:record>
<doc:record id="dwgconvertfilesName">DWGCONVERTFILES</doc:record>
<doc:record id="dwgconvertfilesKeyW">кодировка</doc:record>
<doc:record id="dwgconvertfilesDescr">конвертация выбранных пользователем чертежей</doc:record>
<doc:record id="dwgconvertstrName">DWGCONVERTSTR</doc:record>
<doc:record id="dwgconvertstrKeyW">кодировка</doc:record>
<doc:record id="dwgconvertstrDescr">
конвертация отдельного текстового примитива (примитивов)
из текущего чертежа
</doc:record>
<doc:record id="ns">DWGCONVERT</doc:record>
<doc:record id="arxload">Загрузка всех ARX файлов</doc:record>
<doc:record id="netload">Загрузка всех DLL файлов</doc:record>
</doc:resources>
I need to check XDocument for XSD schema validation. I found two examples in MSDN:
first, second.
But in the samples, the XSD schema is separate from the file. I don't want to do superfluous operations because these schemas are already specified in the xsi:schemaLocation attribute of my XML file.
What is the correct way to execute a check of object XDocument, in which all necessary schemas are already specified in the xsi:schemaLocation attribute?
Regards
This may be a little late, but I found this question, and then I found this answer elsewhere on Stack Overflow: Validating an XML against referenced XSD in C#. I just checked that it worked at least for a locally stored xsd.
Processing of xsi attributes for schema locations is not built in the framework; you will have to do that yourself.
The way I've done it involves the following steps:
reading schemaLocation or noNamespaceSchemaLocation attributes associated with your document root element. This is where you have to come up with your solution that best fits your needs; if you don't care about performance, then you can simply use the DOM based API - it may result in going over the source XML twice: once to parse it into memory, then again to validate it. Or, you use a fast, forward only reader to only read all the attributes of the root node, looking for your xsi: ones, then abandon the reading once past the root element.
Once found, you'll have to parse the attribute values; typically you invoke a string.Split() on whitespace (\t, \r, \n, 0x20), trimming all, discarding empties and making pairs (when namespaces are used). Ultimately, this gives you the list of URIs where your XSDs are located
For each URI, resolve it to an absolute URI, eventually converting any relative using the base absolute URI of your XML file
Build an XmlSchemaSet by adding all the XSDs; compile it and use it for validation by getting a reader from your source XML.

Need help writing and reading XML file in C#

Idea - List of vertices(Key, X, Y, Priority to store).
<?xml version="1.0" encoding="utf-8"?>
<Vertices>
<Vertex Key="0" X="149" Y="209" Priority="7" />
<Vertex Key="1" X="278" Y="128" Priority="7" />
</Vertex>
Is this valid XML? It keeps saying me that root element is missing, when i try to open it... If so, can someone provide a valid c# XDocument code to open this file ?
It's not valid XML - your closing element has the wrong name - this would be valid:
<?xml version="1.0" encoding="utf-8"?>
<Vertices>
<Vertex Key="0" X="149" Y="209" Priority="7" />
<Vertex Key="1" X="278" Y="128" Priority="7" />
</Vertices>
Also make sure that if you are loading an XML file you use XDocument.Load and not XDocument.Parse.
You are opening <Vertices> but closing </Vertex>. Need to change that last closing tag to </Vertices>
Side note:
If you load an XML file into Visual Studio it will tell you if it is invalid XML and why. For this example it gave the errors:
Error 1 Tag was not closed. XMLFile1.xml Line 2 Column 5
Error 2 Expecting end tag </Vertices>. XMLFile1.xml Line 5 Column 6
If you do not own Visual Studio, you can download the Express version for free and get the same functionality.

Categories