Parse an xml document when namespace is no-longer available - c#

I have a number rather large, complex xml documents that I need to loop through. An xmlns is defined at the top of the document however the url this points to is no longer available.
What's the best way to parse the file to get the important data from it using C#?
I tried to load it into a Dataset but would occasionally receive the errors:
The table (endpoint) cannot be the child table to itself in nested relations.
or
Cannot add a SimpleContent column to a table containing element columns or nested relations.
XPath was my next port of call but I had problems because of the lack of namespace.
I suspect this is seriously limiting my options, but does anyone have any suggestions?
Snippet of the XML document:
<?xml version="1.0" encoding="UTF-8"?>
<cdr:cdr_set xmlns:cdr="http://www.naturalconvergence.com/schema/cdr/v3/cdr">
<!-- Copyright (c) 2001-2009, all rights reserved -->
<cdr:cdr xmlns:cdr="http://www.naturalconvergence.com/schema/cdr/v3/cdr">
<cdr:call_id>2040-1247062136726-5485131</cdr:call_id>
<cdr:cdr_id>1</cdr:cdr_id>
<cdr:status>Normal</cdr:status>
<cdr:responsibility>
<cdr:tenant id="17">
<cdr:name>SpiriTel plc</cdr:name>
</cdr:tenant>
<cdr:site id="45">
<cdr:name>KWS</cdr:name>
<cdr:time_zone>GB</cdr:time_zone>
</cdr:site>
</cdr:responsibility>
<cdr:originator type="sipGateway">
<cdr:sipGateway id="3">
<cdr:name>Audiocodes-91</cdr:name>
</cdr:sipGateway>
</cdr:originator>
<cdr:terminator type="group">
<cdr:group>
<cdr:tenant id="17">
<cdr:name>SpiriTel plc</cdr:name>
</cdr:tenant>
<cdr:type>Broadcast</cdr:type>
<cdr:extension>6024</cdr:extension>
<cdr:name>OLD PMS DDIS DO NOT USE</cdr:name>
</cdr:group>
</cdr:terminator>
<cdr:initiation>Dialed</cdr:initiation>
<cdr:calling_number>02087893850</cdr:calling_number>
<cdr:dialed_number>01942760142</cdr:dialed_number>
<cdr:target>6024</cdr:target>
<cdr:direction>Inbound</cdr:direction>
<cdr:disposition>No Answer</cdr:disposition>
<cdr:timezone>GB</cdr:timezone>
<cdr:origination_timestamp>2009-07-08T15:08:56.727+01:00</cdr:origination_timestamp>
<cdr:release_timestamp>2009-07-08T15:09:26.493+01:00</cdr:release_timestamp>
<cdr:release_cause>Normal Clearing</cdr:release_cause>
<cdr:call_duration>PT29S</cdr:call_duration>
<cdr:redirected>false</cdr:redirected>
<cdr:conference>false</cdr:conference>
<cdr:transferred>false</cdr:transferred>
<cdr:estimated>false</cdr:estimated>
<cdr:interim>false</cdr:interim>
<cdr:segments>
<cdr:segment>
<cdr:originationTimestamp>2009-07-08T15:08:56.727+01:00</cdr:originationTimestamp>
<cdr:initiation>Dialed</cdr:initiation>
<cdr:call_id>2040-1247062136726-5485131</cdr:call_id>
<cdr:originator type="sipGateway">
<cdr:sipGateway id="3">
<cdr:name>Audiocodes-91</cdr:name>
</cdr:sipGateway>
</cdr:originator>
<cdr:termination_attempt>
<cdr:termination_timestamp>2009-07-08T15:08:56.728+01:00</cdr:termination_timestamp>
<cdr:terminator type="group">
<cdr:group>
<cdr:tenant id="17">
<cdr:name>SpiriTel plc</cdr:name>
</cdr:tenant>
<cdr:type>Broadcast</cdr:type>
<cdr:extension>6024</cdr:extension>
<cdr:name>OLD PMS DDIS DO NOT USE</cdr:name>
</cdr:group>
</cdr:terminator>
<cdr:provided_address>01942760142</cdr:provided_address>
<cdr:direction>Inbound</cdr:direction>
<cdr:disposition>No Answer</cdr:disposition>
</cdr:termination_attempt>
</cdr:segment>
</cdr:segments>
</cdr:cdr>
...
</cdr:cdr_set>
Each entry is essentially the same but there are sometimes differences such as some of the fields may be missing, if they aren't required.

These values in an xml file are identifiers, not locators. Unless you are expecting to download a schema, it is not needed at all, and can be "flibble" if needed. I expect the best thing would be to just load it into XmlDocument / XDocument and try to access the data.
For example:
XmlDocument doc = new XmlDocument();
doc.Load("cdr.xml");
XmlNamespaceManager ns = new XmlNamespaceManager(doc.NameTable);
ns.AddNamespace("cdr", "http://www.naturalconvergence.com/schema/cdr/v3/cdr");
XmlElement el = (XmlElement)doc.SelectSingleNode(
"cdr:cdr_set/cdr:cdr/cdr:originator", ns);
Console.WriteLine(el.GetAttribute("type"));
or to loop over the cdr elements:
foreach (XmlElement cdr in doc.SelectNodes("/cdr:cdr_set/cdr:cdr", ns))
{
Console.WriteLine(cdr.SelectSingleNode("cdr:call_id", ns).InnerText);
}
Note that the aliases used in the document are largely unrelated to the aliases used in the XmlNamespaceManager, hence you need to re-declare it. I could have used x as my alias in the C# just as easily.
Of course, if you prefer to work with an object model; run it through xsd (where cdr.xml is your example file):
xsd cdr.xml
xsd cdr.xsd /classes
Now you can load it with XmlSerializer.

alternativley load it into an Xdocument and use linq2XML? ... although you might just get the same error.
I don't know what data you want, so its hard to suggest a query.
I personally prefer the use of XDocument to xmlDocument now in most cases.
the only problem with the automatic generation of an XSD is that it can get your datatypes pretty badly wrong if you are not using a good sized chunk of sample data.

Related

Generate XML File From Generated Object

I started with three (3) XSD files provided from an external party (one XSD links to the other two). I used the xsd.exe tool to generate a .NET object by running the following command: xsd.exe mof-simpleTypes.xsd mof-isp.xsd esf-submission.xsd /c and it generated a single CS file with a handful of partial objects.
I've created an XmlSerializerNamespaces object and fill with the namespaces required (two directly used in the provided sample XML file as well as two others that don't appear to be referenced). I have successfully generated an XML file using the following method:
private XmlDocument ConvertEsfToXml(ESFSubmissionType type)
{
var xml = new XmlDocument();
var serializer = new XmlSerializer(type.GetType());
string result;
using (var writer = new Utf8StringWriter()) //override of StringWriter to force UTF-8
{
serializer.Serialize(writer, type, _namespaces); //_namespaces object holds all 4 namespaces
result = writer.ToString();
}
xml.LoadXml(result);
return xml;
}
My problem that I'm facing is in the generated CS file, one of the objects has a property (another generated partial object) that is of type XmlElement. I have successfully built the object in code, and I'm having an issue converting the object to an XmlElement. The questions and answers I have found here on SO say convert it to an XmlDocument first and then take the DocumentElement property. This works, however the returned XML has namespaces embedded in the element as follows:
<esf:ESFSubmission xmlns:isp="http://www.for.gov.bc.ca/schema/isp" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:esf="http://www.for.gov.bc.ca/schema/esf">
<esf:submissionMetadata>
<esf:emailAddress>test#test.com</esf:emailAddress>
<esf:telephoneNumber>1234567890</esf:telephoneNumber>
</esf:submissionMetadata>
<esf:submissionContent>
<isp:ISPSubmission xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:esf="http://www.for.gov.bc.ca/schema/esf" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:isp="http://www.for.gov.bc.ca/schema/isp">
<isp:ISPMillReport>
<isp:reportMonth>12</isp:reportMonth>
<isp:reportYear>2014</isp:reportYear>
<isp:reportComment>comment</isp:reportComment>
<isp:ISPLumberDetail>
<isp:species>FI</isp:species>
Note: this is just a partial of the generated XML file (for illustration purposes).
As you can see, each XML node is prefixed with the namespace variable. My question is: how can I do this in code? Is my approach sound and if so, then how do NOT include the namespaces in the ISPSubmission node OR if there is a better way to approach this problem that I overlooked, please provide insight. My desired outcome is to have all namespace definitions at the top of the document (their appropriate location) and not on the sub elements - as well as maintain the namespace variables on each element as illustrated above.
EDIT (after reggaeguitar's comment)
Here is the sample XML document I was provided
<?xml version="1.0" encoding="UTF-8"?>
<esf:ESFSubmission xmlns:esf="http://www.for.gov.bc.ca/schema/esf"
xmlns:isp="http://www.for.gov.bc.ca/schema/isp" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.for.gov.bc.ca/schema/esf esf-submission.xsd
http://www.for.gov.bc.ca/schema/isp mof-isp.xsd">
<esf:submissionMetadata>
<esf:emailAddress>mailto:eric.murphy#cgi.com</esf:emailAddress>
<esf:telephoneNumber>6044445555</esf:telephoneNumber>
</esf:submissionMetadata>
<esf:submissionContent>
<isp:ISPSubmission>
<isp:ISPMillReport>
<isp:reportMonth>06</isp:reportMonth>
<isp:reportYear>2014</isp:reportYear>
<isp:reportComment>Up to 4000 characters is permitted for notes in this element.</isp:reportComment>
<isp:ISPLumberDetail>
<isp:species>FI</isp:species>
<isp:lumberGrade>EC</isp:lumberGrade>
<isp:gradeDescription/>
<isp:size>2x4</isp:size>
<isp:finishType/>
<isp:length>10</isp:length>
<isp:thickWidthUom>IN</isp:thickWidthUom>
<isp:volumeUnitOfMeasure>MBM</isp:volumeUnitOfMeasure>
<isp:volume>11543.987</isp:volume>
<isp:amount>1467893.98</isp:amount>
<isp:invoiceNumber>837261</isp:invoiceNumber>
</isp:ISPLumberDetail>
<isp:ISPLumberDetail>
<isp:species>CE</isp:species>
<isp:lumberGrade/>
<isp:gradeDescription/>
<isp:size/>
<isp:finishType>D</isp:finishType>
<isp:thickness>40</isp:thickness>
<isp:width>100</isp:width>
<isp:thickWidthUom>MM</isp:thickWidthUom>
<isp:volumeUnitOfMeasure>MBM</isp:volumeUnitOfMeasure>
<isp:volume>9743.987</isp:volume>
<isp:amount>1247893.98</isp:amount>
<isp:invoiceNumber/>
</isp:ISPLumberDetail>
<isp:ISPChipDetail>
<isp:species>CE</isp:species>
<isp:unitOfMeasure>BDT</isp:unitOfMeasure>
<isp:wholeLogInd>N</isp:wholeLogInd>
<isp:destinationCode>FBCO</isp:destinationCode>
<isp:destinationDescription/>
<isp:volume>563</isp:volume>
<isp:amount>54463</isp:amount>
<isp:invoiceNumber>12345679</isp:invoiceNumber>
</isp:ISPChipDetail>
</isp:ISPMillReport>
<isp:ISPSubmitter>
<isp:millNumber>103</isp:millNumber>
<isp:contactName>Dave Marotto</isp:contactName>
<isp:contactEmail>eric.murphy#cgi.com</isp:contactEmail>
<isp:contactPhone>2507775555</isp:contactPhone>
<isp:contactPhoneExtension>1234</isp:contactPhoneExtension>
</isp:ISPSubmitter>
</isp:ISPSubmission>
</esf:submissionContent>
</esf:ESFSubmission>
Solved my problem by doing the whole thing in code and not even using the xsd.exe to generate a .NET object.

How to search through XML to find bad nodes

I have a large XML file (68Mb), I am using SQL Server Business Intelligence Studio 2008 to extract the XML data into a database. There is an error in the XML file some where that prevents it from executing. Possibly a missing tag or something like that. The file is so large I cant manually sort through it looking for the error.
Below is a sample of the the XML schema used.
How can I use XPath to sort through the XML in VS 2012 using C#?
An example would be great!
-<PhoneNumberList>
<PhoneNumber value="1234567890" type="Phone"/>
</PhoneNumberList>
-<YearsOfServiceList>
<YearsOfService experienceInMonths="24" description="SuperAdmin" objectCode="049"/>
</YearsOfServiceList>
</Person>
-<Person dob="1960-01-09T00:00:00" lastName="Smith" middleName="Will" firstName="John" id="9999-9999-9999">
-<SiteList>
-<Site id="2014" siteLongName="HA" siteCode="1255" systemCode="999">
-<StaffPositionList>
<StaffPosition id="73" staffPosition="Administrator"/>
</StaffPositionList>
</Site>
</SiteList>
-<ProgramList>
<Program id="1234" siteLongName="ABC" siteCode="0000" systemCode="205"/>
<Program id="5678" siteLongName="DEF" siteCode="0000" systemCode="357"/>
</ProgramList>
-<TypeList>
<Type Description="Leader" certificateType="D"/>
<Type Description="Professional" certificateType="P"/>
</TypeList>
-<EmailList>
<Email value="jsmith#somesite.com" type="Email"/>
</EmailList>
-<PhoneNumberList>
<PhoneNumber value="1234567890" type="Phone"/>
</PhoneNumberList>
-<YearsOfServiceList>
<YearsOfService experienceInMonths="24" description="SuperAdmin" objectCode="049"/>
</YearsOfServiceList>
</Person>
</PersonList>
</GetPersonDetail>
If you want to do it in code then create an XSD file describing a valid format for the data, embed it as a resource in your app and then use code like this
var errors = new List<string>();
var schemaSet = new XmlSchemaSet();
schemaSet.Add("", XmlReader.Create(new StringReader(Properties.Resources.NameOfXSDResource)));
document.Validate(schemaSet, (sender, args) =>
{
errors.Add(args.Message);
}
);
This will give you a list of validation errors.
You don't need to search "by hand" if you use a competent text editor. NotePad++'s XML plugin, for instance, can determine if your XML as a whole is well-formed or valid, and both instances will provide separate error messages.
If you don't have a schema and the file is well-formed, you can use the CLR's System.XML namespace to read in the document and then iterate through its nodes using LINQ-to-XML, which would allow you to very finely control which nodes go where. With LINQ, you could either create a new XML file with only the valid entries, procedurally correct the invalid entries as you determine where they are, or even just write to your SQL server database directly.
Your troubleshooting process should be something as follows:
Is the XML well-formed? I..e, does it comport to the fundamental rules of XML?
Is the XML valid? I.e., does it have the elements and attributes you expect?
Is your import query accurate?
For things like this I usually have luck checking and fixing the data in Notepad++. Install the XmlTools plugin and that has a menu for checking the xml syntax and tags.
Also, those dashes will give you problems, it's best to save out the xml file directly without copying by hand.
A 68MB XML file is no problem for XML editors such as XMLBlueprint 64-bit (http://www.xmlblueprint.com/) or Stylus Studio (http://www.stylusstudio.com/). Just check the well-formedness of your xml file (F7 in XMLBlueprint) and the editor will display the errors.

XDocument Add multiple XElements

In my Windows Phone 8 C#/XAML .NET 4.5 Project, I'm trying to create an XDocument with similar structure:
<element1>
<subelement1>
</subelement1>
<subelement2>
...etc...
</subelement2>
</element1>
<element2>
<subelement1>
</subelement1>
<subelement2>
...etc...
</subelement2>
</element2>
The method creating the document looks like (simplified for the question purposes):
... createXML()
{
XDocument doc = new XDocument();
XElement elem1 = new XElement("element1");
elem1.Add(new XElement("subelement1"));
XElement elem2 = new XElement("element2");
doc.Add(elem1);
doc.Add(elem2);
}
But I keep getting InvalidOperationException saying that it would create a invalid document structure.
I know why - it would cause the document to have multiple "root nodes" - but I effectively need it that way.
This structure is needed for webservice done by third party, which recieves the document as a string.
So the question is "How to achieve this structure? Should I use some other XObject instead?"
(I know that probably the most simple solution would be to use collection of XElements...just askin' if there is another way out of curiosity)
The structure that you specified at the top of the post is illegal, because valid XML documents must have a single root element; your document has two elements at the top level, which is not allowed.
You can solve this problem by adding a root element at creation time, and then discarding it when reading the document;
document = new XDocument(new XElement("root", elem1, elem2));

How to change the data within elements in a XML file using C#?

I'm kind of new to XML files in C# ASP.NET. I have a XML in the below format:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Installation>
<ServerIP>192.168.20.110</ServerIP>
<DB_Name>USTCKT1</DB_Name>
<Username>jorame</Username>
<Password>Cru$%e20</Password>
<Table_PreFix>TCK</Table_PreFix>
</Installation>
I need to change the values within each element. For example, when an user clicks I should be able to replace 192.168.20.110 with 192.168.1.12.
How can I accomplish this? Any help will be really appreciated.
You should look at using the methods in the XDocument class. http://msdn.microsoft.com/en-us/library/bb301598.aspx
Specifically look at the methods: Load(string) - to load an XML file, Element() - to access a specific element and Save(string) - to save the XML document. The page on Element() has some sample code which can help.
http://msdn.microsoft.com/en-us/library/system.xml.linq.xcontainer.element.aspx
You can do something like this using the XDocument class:
XDocument doc = XDocument.Load(file.xml);
doc.Element("Installation").Element("ServerIP").Value = "192.168.1.12";
//Update the rest of the elements
doc.Save(file.xml);
More Details
If you run into namespace issues when selecting your elements you will need to include the xml namespace in the XElement selectors eg doc.Element(namspace + "Installation")
In general, you can do it in the following steps:
Create a new XmlDocument object and load the content. The content might be a file or string.
Find the element that you want to modify. If the structure of your xml file is too complex, you can use xpath you find what you want.
Apply your modification to that element.
Update your xml file.
Here is a simple demo:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("file.xml"); // use LoadXml(string xml) to load xml string
string path = "/Installation/ServerIP";
XmlNode node = xmlDoc.SelectSingleNode(path); // use xpath to find a node
node.InnerText = "192.168.1.12"; // update node, replace the inner text
xmlDoc.Save("file.xml"); // save updated content
Hope it's helpful.

Query an XmlDocument without getting a 'Namespace prefix is not defined' problem

I've got an Xml document that both defines and references some namespaces. I load it into an XmlDocument object and to the best of my knowledge I create a XmlNamespaceManager object with which to query Xpath against. Problem is I'm getting XPath exceptions that the namespace "my" is not defined. How do I get the namespace manager to see that the namespaces I am referencing are already defined. Or rather how do I get the namespace definitions from the document to the namespace manager.
Furthermore tt strikes me as strange that you have to provide a namespace manager to the document which you create from the documents nametable in the first place. Even if you need to hardcode manual namespaces why can't you add them directly to the document. Why do you always have to pass this namespace manager with every single query? What can't XmlDocument just know?
Code:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(programFiles + #"Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\FEATURES\HfscBookingWorkflow\template.xml");
XmlNamespaceManager ns = new XmlNamespaceManager(xmlDoc.NameTable);
XmlNode referenceNode = xmlDoc.SelectSingleNode("/my:myFields/my:ReferenceNumber", ns);
referenceNode.InnerXml = this.bookingData.ReferenceNumber;
XmlNode titleNode = xmlDoc.SelectSingleNode("/my:myFields/my:Title", ns);
titleNode.InnerXml = this.bookingData.FamilyName;
Xml:
<?xml version="1.0" encoding="UTF-8" ?>
<?mso-infoPathSolution name="urn:schemas-microsoft-com:office:infopath:Inspection:-myXSD-2010-01-15T18-21-55" solutionVersion="1.0.0.104" productVersion="12.0.0" PIVersion="1.0.0.0" ?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?>
<my:myFields xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003">
<my:DateRequested xsi:nil="true" />
<my:DateVisited xsi:nil="true" />
<my:ReferenceNumber />
<my:FireCall>false</my:FireCall>
Update:
ns.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
ns.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml");
ns.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
This does the job, but it mean's I have to hard code to this particular xml schema. This schema represents an infopath form template. In particular the my namespace url will be different for every form template so I really don't want to hardcode this. It would be nice to find a clean way to get this namespace from the xml without resorting to RegEx.
I was hoping that the XmlNamespaceManager would just sort of pick up the namespace definitions form the NameTable. I mean their in the Xml but I still have to define them.
ns.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
ns.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml");
ns.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
This does the job, but it mean's I have to hard code to this particular xml schema. This schema represents an infopath form template. In particular the my namespace url will be different for every form template so I really don't want to hardcode this. It would be nice to find a clean way to get this namespace from the xml without resorting to Regex.
I was hoping that the XmlNamespaceManager would just sort of pick up the namespace definitions form the NameTable. I mean their in the Xml but I still have to define them.
Here is the answer to the "What can't XmlDocument just know?" question.
NameTable is just an optimization for storing names. It has actually nothing to do with namespaces.
And even if XmlNamespaceManager could infer all namespaces and prefixes from XML doc that won't help in general case because of XML namespaces nature, e.g. what would XmlNamespaceManager map "my" prefix in this case:
<root>
<foo xmlns:my="blah"/>
<foo xmlns:my="balh-blah-blah"/>
</root>
Have you defined "my" in the namespace-manager?
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
Or better - choose something that is unlikely to conflict. It does seem odd that it didn't pick it up from the name-table, though.
For me with InfoPath 2007 this solved the problem
static public XmlNamespaceManager GetNameSpaceManager(this XmlDocument document)
{
XmlNamespaceManager xmlNamespaceManager = new XmlNamespaceManager(document.NameTable);
xmlNamespaceManager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
xmlNamespaceManager.AddNamespace("dfs", "http://schemas.microsoft.com/office/infopath/2003/dataFormSolution");
xmlNamespaceManager.AddNamespace("d", "http://schemas.microsoft.com/office/infopath/2003/ado/dataFields");
xmlNamespaceManager.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2012-03-29T06:28:28");
xmlNamespaceManager.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
return xmlNamespaceManager;
}

Categories