Need to parse a xml string - c#

I need to a parse an xml string(.NET, C#) which , unfortunately, is not well formed.. the xml stream that i am getting back is
<fOpen>true</fOpen>
<ixBugParent>0</ixBugParent>
<sLatestTextSummary></sLatestTextSummary>
<sProject>Vantive</sProject>
<ixArea>9</ixArea>
I have tried using a xml reader, but its crashing out because it thinks ,and rightfully so, there are 2 node elements wheneever it tries to parse
Is there something that I can do with this ? I cant change the XML, cause I have no control of the code that sends the XML back ..
Any help, would be appreciated.
Thanks and Regards
Gagan Janjua

I think you can use the XmlParserContext in one of the XmlTextReader overloads to specify that the node type is an XmlNodeType.Element, similar to this example from MSDN (http://msdn.microsoft.com/en-us/library/cakk7ha0.aspx):
XmlTextReader tr = new XmlTextReader("<element1> abc </element1>
<element2> qrt </element2>
<?pi asldfjsd ?>
<!-- comment -->", XmlNodeType.Element, null);
while(tr.Read()) {
Console.WriteLine("NodeType: {0} NodeName: {1}", tr.NodeType, tr.Name);
}

What you are getting back is a well-formed XML fragment but as you pointed out, not a well-formed XML document. Can you
wrap a top-level element around the returned elements? or
reference the returned XML fragment as an external entity from within a shell XML document, and pass the shell document to the XML reader?

Related

How to search through XML to find bad nodes

I have a large XML file (68Mb), I am using SQL Server Business Intelligence Studio 2008 to extract the XML data into a database. There is an error in the XML file some where that prevents it from executing. Possibly a missing tag or something like that. The file is so large I cant manually sort through it looking for the error.
Below is a sample of the the XML schema used.
How can I use XPath to sort through the XML in VS 2012 using C#?
An example would be great!
-<PhoneNumberList>
<PhoneNumber value="1234567890" type="Phone"/>
</PhoneNumberList>
-<YearsOfServiceList>
<YearsOfService experienceInMonths="24" description="SuperAdmin" objectCode="049"/>
</YearsOfServiceList>
</Person>
-<Person dob="1960-01-09T00:00:00" lastName="Smith" middleName="Will" firstName="John" id="9999-9999-9999">
-<SiteList>
-<Site id="2014" siteLongName="HA" siteCode="1255" systemCode="999">
-<StaffPositionList>
<StaffPosition id="73" staffPosition="Administrator"/>
</StaffPositionList>
</Site>
</SiteList>
-<ProgramList>
<Program id="1234" siteLongName="ABC" siteCode="0000" systemCode="205"/>
<Program id="5678" siteLongName="DEF" siteCode="0000" systemCode="357"/>
</ProgramList>
-<TypeList>
<Type Description="Leader" certificateType="D"/>
<Type Description="Professional" certificateType="P"/>
</TypeList>
-<EmailList>
<Email value="jsmith#somesite.com" type="Email"/>
</EmailList>
-<PhoneNumberList>
<PhoneNumber value="1234567890" type="Phone"/>
</PhoneNumberList>
-<YearsOfServiceList>
<YearsOfService experienceInMonths="24" description="SuperAdmin" objectCode="049"/>
</YearsOfServiceList>
</Person>
</PersonList>
</GetPersonDetail>
If you want to do it in code then create an XSD file describing a valid format for the data, embed it as a resource in your app and then use code like this
var errors = new List<string>();
var schemaSet = new XmlSchemaSet();
schemaSet.Add("", XmlReader.Create(new StringReader(Properties.Resources.NameOfXSDResource)));
document.Validate(schemaSet, (sender, args) =>
{
errors.Add(args.Message);
}
);
This will give you a list of validation errors.
You don't need to search "by hand" if you use a competent text editor. NotePad++'s XML plugin, for instance, can determine if your XML as a whole is well-formed or valid, and both instances will provide separate error messages.
If you don't have a schema and the file is well-formed, you can use the CLR's System.XML namespace to read in the document and then iterate through its nodes using LINQ-to-XML, which would allow you to very finely control which nodes go where. With LINQ, you could either create a new XML file with only the valid entries, procedurally correct the invalid entries as you determine where they are, or even just write to your SQL server database directly.
Your troubleshooting process should be something as follows:
Is the XML well-formed? I..e, does it comport to the fundamental rules of XML?
Is the XML valid? I.e., does it have the elements and attributes you expect?
Is your import query accurate?
For things like this I usually have luck checking and fixing the data in Notepad++. Install the XmlTools plugin and that has a menu for checking the xml syntax and tags.
Also, those dashes will give you problems, it's best to save out the xml file directly without copying by hand.
A 68MB XML file is no problem for XML editors such as XMLBlueprint 64-bit (http://www.xmlblueprint.com/) or Stylus Studio (http://www.stylusstudio.com/). Just check the well-formedness of your xml file (F7 in XMLBlueprint) and the editor will display the errors.

How to change the data within elements in a XML file using C#?

I'm kind of new to XML files in C# ASP.NET. I have a XML in the below format:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Installation>
<ServerIP>192.168.20.110</ServerIP>
<DB_Name>USTCKT1</DB_Name>
<Username>jorame</Username>
<Password>Cru$%e20</Password>
<Table_PreFix>TCK</Table_PreFix>
</Installation>
I need to change the values within each element. For example, when an user clicks I should be able to replace 192.168.20.110 with 192.168.1.12.
How can I accomplish this? Any help will be really appreciated.
You should look at using the methods in the XDocument class. http://msdn.microsoft.com/en-us/library/bb301598.aspx
Specifically look at the methods: Load(string) - to load an XML file, Element() - to access a specific element and Save(string) - to save the XML document. The page on Element() has some sample code which can help.
http://msdn.microsoft.com/en-us/library/system.xml.linq.xcontainer.element.aspx
You can do something like this using the XDocument class:
XDocument doc = XDocument.Load(file.xml);
doc.Element("Installation").Element("ServerIP").Value = "192.168.1.12";
//Update the rest of the elements
doc.Save(file.xml);
More Details
If you run into namespace issues when selecting your elements you will need to include the xml namespace in the XElement selectors eg doc.Element(namspace + "Installation")
In general, you can do it in the following steps:
Create a new XmlDocument object and load the content. The content might be a file or string.
Find the element that you want to modify. If the structure of your xml file is too complex, you can use xpath you find what you want.
Apply your modification to that element.
Update your xml file.
Here is a simple demo:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("file.xml"); // use LoadXml(string xml) to load xml string
string path = "/Installation/ServerIP";
XmlNode node = xmlDoc.SelectSingleNode(path); // use xpath to find a node
node.InnerText = "192.168.1.12"; // update node, replace the inner text
xmlDoc.Save("file.xml"); // save updated content
Hope it's helpful.

XML Illegal Characters in path

I am querying a soap based service and wish to analyze the XML returned however when I try to load the XML into an XDoc in order to query the data. am getting an 'illegal characters in path' error message? This (below) is the XML returned from the service. I simply want to get the list of competitions and put them into a List I have setup. The XML does load into an XML Document though so must be correctly formatted?.
Any advice on the best way to do this and get round the error would be greatly appreciated.
<?xml version="1.0" ?>
- <gsmrs version="2.0" sport="soccer" lang="en" last_generated="2010-08-27 20:40:05">
- <method method_id="3" name="get_competitions">
<parameter name="area_id" value="1" />
<parameter name="authorized" value="yes" />
<parameter name="lang" value="en" />
</method>
<competition competition_id="11" name="2. Bundesliga" soccertype="default" teamtype="default" display_order="20" type="club" area_id="80" last_updated="2010-08-27 19:53:14" area_name="Germany" countrycode="DEU" />
</gsmrs>
Here is my code, I need to be able to query the data in an XDoc:
string theXml = myGSM.get_competitions("", "", 1, "en", "yes");
XmlDocument myDoc = new XmlDocument();
MyDoc.LoadXml(theXml);
XDocument xDoc = XDocument.Load(myDoc.InnerXml);
You don't show your source code, however I guess what you are doing is this:
string xml = ... retrieve ...;
XmlDocument doc = new XmlDocument();
doc.Load(xml); // error thrown here
The Load method expects a file name not an XML itself. To load an actual XML, just use the LoadXml method:
... same code ...
doc.LoadXml(xml);
Similarly, using XDocument the Load(string) method expects a filename, not an actual XML. However, there's no LoadXml method, so the correct way of loading the XML from a string is like this:
string xml = ... retrieve ...;
XDocument doc;
using (StringReader s = new StringReader(xml))
{
doc = XDocument.Load(s);
}
As a matter of fact when developing anything, it's a very good idea to pay attention to the semantics (meaning) of parameters not just their types. When the type of a parameter is a string it doesn't mean one can feed in just anything that is a string.
Also in respect to your updated question, it makes no sense to use XmlDocument and XDocument at the same time. Choose one or the another.
Following up on Ondrej Tucny's answer :
If you would like to use an xml string instead, you can use an XElement, and call the "parse" method. (Since for your needs, XElement and XDocument would meet your needs)
For example ;
string theXML = '... get something xml-ish...';
XElement xEle = XElement.Parse(theXML);
// do something with your XElement
The XElement's Parse method lets you pass in an XML string, while the Load method needs a file name.
Why not
XDocument.Parse(theXml);
I assume this will be the right solution
If this is really your output it is illegal XML because of the minus characters ('-'). I suspect that you have cut and pasted this from a browser such as IE. You must show the exact XML from a text editor, not a browser.

How to handle xml that contains nested xml using c# xmlreader?

I'm using c# to interact with a database that has an exposed REST API. The table that I'm interested in contains forum posts, some of which themselves contain xml.
Whenever my result set contains a post that has xml, my application throws an error as follows:
Exception Details: System.Xml.XmlException: '>' is an unexpected token. The expected token is '"' or '''. Line 1, position 62.
And this is the line that fails:
Line 44: ds.ReadXml(xmlData);
And this is the code I'm using:
var webClient = new WebClient();
string searchString = searchValue.Text;
string requestUrl = "http://myserver/restapi.ashx/search.xml?pagesize=4&pageindex=0&query=";
requestUrl += searchString;
XmlReaderSettings settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
XmlReader xmlData = XmlReader.Create(webClient.OpenRead(requestUrl),settings);
DataSet ds = new DataSet();
ds.ReadXml(xmlData);
Repeater1.DataSource = ds.Tables[1];
Repeater1.DataBind();
And this is the type of XML record that it's choking on (the stuff in the node is causing the problem):
<SearchResults PageSize="1" PageIndex="0" TotalCount="342">
<SearchResult>
<ContentId>994</ContentId>
<Title>Help Files: What are they written in?</Title>
<Url>http://myserver/linktest.aspx</Url>
<Date>2008-10-16T16:18:00+01:00</Date><ContentType>post</ContentType>
<Body><div class="ForumPostBodyArea"> <div class="ForumPostContentText"> <p>Can anyone see anything obviously wrong with this xml, when its fired to CRM Its creating 13 null records.</p> <p><?xml version="1.0" encoding="UTF-8"?><soap:Envelope xmlns:typens="http://tempuri.org/type" soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:wsdlns="http://tempuri.org/wsdl/" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Header><SessionHeader><sessionId xsi:type="xsd:long">18208442035524</sessionId></SessionHeader></soap:Header><soap:Body><typens:add><entityname xsi:type="xsd:string">lead</entityname><records xsi:nil="true" xsi:type="typens:ewarebase" /><status xsi:type="xsd:string">PreRegistration</status><requester xsi:type="xsd:string">Mimnagh</requester><personfirstname xsi:type="xsd:string">Sean</personfirstname><personlastname xsi:type="xsd:string">Test2</personlastname><personsalutation xsi:type="xsd:string">Mr</personsalutation><details xsi:type="xsd:string">test project details</details><description xsi:type="xsd:string">test description details</description><comments xsi:type="xsd:string">test project comments</comments><personemail xsi:type="xsd:string">smimnagh#mac.com</personemail><personphonenumber xsi:type="xsd:string">12334566777</personphonenumber><type xsi:type="xsd:string">PreReg</type><companyname xsi:type="xsd:string">Site Client</companyname></typens:add></soap:Body></soap:Envelope></p> <p>Many thanks</p> </div> </div>
</Body>
<Tags>
<Tag>xml</Tag>
</Tags>
<IndexedAt>2010-07-08T11:53:46.848+01:00</IndexedAt>
</SearchResult>
</SearchResults>
Is there something that I can do with the xmlreader to make it ignore whatever's causing the problem?
Please note that I can't change the XML prior to consuming it - so if it's malformed then I wonder if there's a way to ignore or modify that particular record without generating an error?
Thanks!
It looks like some of your quotes need escaping in the contents of some of your elements. Try using
"
for quote marks that aren't wrapping attribute values.
UPDATE:
Because the data you want to read isn't strictly XML (it's nearly XML) you're best bet is to
Either you or your boss, if you have one, screams at the third party because they're not sending you well formed XML.
Perform some horrible hack to try and convert whatever you might get to XML.
If you have to go with point 2, the simplest thing that pops into my head is to read the characters of the 'XML' counting in and out of angle brackets. If you find any " characters and you're not within any angle brackets, replace the " with
"
But note that doing that is a complete last resort.
The Content of your <Body> tag is not well formed. XML is very strict with the syntax of data. Either embed a CDATA section into your XML or escape the string properly.

Read in an XML String with Namespaces for Use in an XSL Transformation

In an ASP.NET 2.0 website, I have a string representing some well-formed XML. I am currently creating an XmlDocument object with it and running an XSL transformation for display in a Web form. Everything was operating fine until the XML input started to contain namespaces.
How can I read in this string and allow namespaces?
I've included the current code below. The string source comes from an HTML encoded node in a WordPress RSS feed.
XPathNavigator myNav= myPost.CreateNavigator();
XmlNamespaceManager myManager = new XmlNamespaceManager(myNav.NameTable);
myManager.AddNamespace("content", "http://purl.org/rss/1.0/modules/content/");
string myPost = HttpUtility.HtmlDecode("<post>" +
myNav.SelectSingleNode("//item[1]/content:encoded", myManager).InnerXml +
"</post>");
XmlDocument myDocument = new XmlDocument();
myDocument.LoadXml(myPost.ToString());
The error is on the last line:
"System.Xml.XmlException: 'w' is an undeclared namespace. Line 12, position 201. at System.Xml.XmlTextReaderImpl.Throw(Exception e) ..."
Your code looks right.
The problem is probably in the xml document you're trying to load.
It must have elements with a "w" prefix, without having that prefix declared in the XML document
For example, you should have:
<test xmlns:w="http://...">
<w:elementInWNamespace />
</test>
(your document is probably missing the xmlns:w="http://")
Gut feel - one of the namespaces declared in //content:encoding is being dropped (probably because you're using the literal .InnerXml property)
What's 'w' namespace evaluate to in the myNav DOM? You'll want to add xmlns:w= to your post node. There will probably be others too.

Categories