Parsing wrongly formatted XML-SOAP with C#

Parsing wrongly formatted XML-SOAP with C# - c#

I have malformed XML (SOAP) file which I need to parse. The issue is that XML doesn't have proper header tags.
I've tried to parse file with XDocument and XmlDocument but neither has worked. XML starts from the line 30, so maybe there is some way to skip those lines before file is read by XML parser?
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:eb="http://www.oasis-open.org/committees/ebxml-msg/schema/msg-header-2_0.xsd">
<SOAP-ENV:Header>
</SOAP-ENV:Header>
<SOAP-ENV:Body>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="Finvoice.xsl"?>
<GGVersion="2.01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="a.xsd">
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;
XmlReader r = XmlReader.Create(file.FullName, settings);
XmlDocument xDoc = new XmlDocument();
xDoc.PreserveWhitespace = true;
xDoc.LoadXml("<xml/>");
xDoc.DocumentElement.CreateNavigator().AppendChild(r);
XmlNamespaceManager manager = new XmlNamespaceManager(xDoc.NameTable);
Once trying to parse I get: Unexpected xml declaration. The xml declaration must be the first node in the document ....

If I understand you correctly, then the data you are looking for starts after the SOAP envelope. There is no garbage/unnessescary contents after the data you are looking for.
The SOAP header does not start with the XML declaration (<?xml version=, etc).
Looking for the start of the document
A simple solution is to find the start of the XML document (the data you are looking for), and chop away everything before that.
var startOfRealDocumentMarker = "<?xml version=\"1.0\"";
var startIndex = dirtyXmlString.IndexOf(startOfRealDocumentMarker);
if(startIndex == -1) {
throw new Exception("Start of XML not found. Now what?");
}
var cleanXmlString = dirtyXmlString.Substring(startIndex);
If the SOAP header also has an XML declaration, you could look for the end-tag of the SOAP envelope instead. Or you could start looking for the declaration at the 2nd character, so you would skip over the first one.
This is obviously not a fool-proof solution that will work in every case. But maybe it will work in all of your cases?
Skipping lines
If you're sure it will work to always start reading from line 30 of the input file, you can use this method instead.
XmlDocument xDoc = new XmlDocument();
using (var rdr = new StreamReader(pathToXmlFile))
{
// Skip until reader is positioned at start of line 30
for (var i = 0; i < 29; ++i)
{
rdr.ReadLine();
}
// Load document from current position of reader
xDoc.Load(rdr);
}

Related

XElement.Load() and "undeclared prefix" exception

I'm trying to load xml file using Xelement.Load() method and in case of some files, I get "ditaarch" is an undeclared prefix exception. The content of such troublesome xml's are similar to this simplified version:
<?xml version="1.0" encoding="UTF-8"?>
<concept ditaarch:DITAArchVersion="1.3">
<title>Test Title</title>
<menucascade>
<uicontrol>text</uicontrol>
<uicontrol/>
</menucascade>
</concept>
I've tried to follow suggestions to manually add or ignore "ditaarch" namespace using xml namespace manager:
using (XmlReader reader = XmlReader.Create(#"C:\test\example.xml"))
{
NameTable nameTable = new NameTable();
XmlNamespaceManager nameSpaceManager = new XmlNamespaceManager(nameTable);
nameSpaceManager.AddNamespace("ditaarch", "");
XmlParserContext parserContext = new XmlParserContext(null, nameSpaceManager, null, XmlSpace.None);
XElement elem = XElement.Load(reader);
}
But it leads to same exception as before. Most probably the solution is trivial but I just can't see it :(
If anyone would be able to point me in the right direction, I would be most grateful.

The presented markup is not namespace well-formed XML so I don't think XElement or XDocument is an option as it doesn't support colons in names. You can parse it with a legacy new XmlTextReader("foo.xml") { Namespaces = false } however.
And you could use XmlDocument instead of XDocument or XElement and check for any empty elements with e.g.
XmlDocument doc = new XmlDocument();
using (XmlReader xr = new XmlTextReader("example.xml") { Namespaces = false })
{
doc.Load(xr);
}
Console.WriteLine("Number of empty elements: {0}", doc.SelectNodes("//*[not(*)][not(normalize-space())]").Count);

How to working with xml nodes

I have program with response from website in the xml format with namespace.
example program:
string responsedata;//response data from website
//Creat new XMLdoc object for response data
XmlDocument ResponseDataXml = new XmlDocument();
ResponseDataXml.InnerXml= responsedata;
XmlNamespaceManager xnsm = new XmlNamespaceManager(ResponseDataXml.NameTable);
xnsm.AddNamespace("ps","http://example.com/namespace/ps");
//create xml document fro validating DTD
XmlDocument ValidateXml = new XmlDocument();
//select Nodes <ps:results> ... <result>
XmlNodeList NodesResults = ResponseDataXml.SelectNodes("ps:results/result");
foreach (XmlNode node in NodesResults)
{
ValidateXml.InnerText= "";
ValidateXml.InnerText += "<?xml version=\"1.0\" encoding=\"utf-8\"?>";
ValidateXml.InnerText += "<!DOCTYPE SouborN1A SYSTEM \"validate_dtd.dtd\">";
ValidateXml.InnerText+=node.InnerXml;
ValidateXml.Save("validate_temp.xml");
if (validate("validate_temp.xml"))//validate() return true if document is valid
{
Console.WriteLine("Result:" + node.Attributes["id"] + " is valid !!!!!");
// here i can append "result" node in new xml document "Valid_result.xml"
}
else
{
Console.WriteLine("Result:"+ node.Attributes["id"] + "i invalid !!!!!");
// here i can append "result" in new xml document invalid result in to "invalid_result.xml"
}
}
Input postdata:
<ps:report xmlns:ps="example.com.....">
<results xmlns:ps="example.com....." ps:Identifikation="999999">
<result id="11125">
.......
</result>
<result id="1100">
.......
</result>
<result id="111999055">
.......
</result>
<result id="100000">
.......
</result>
</results>
</ps:report>
Please help me... :)
I do not know how to proceed, and work with a given output,
I need mainly validate the item separately and then store in a xml file.
I apologize for my English.
Thanks.

Unfortunately, this question is fairly broad, but generally speaking everything can be done with XML libraries in C#. For example, use the XmlDocument class to create elements and then simply append those elements to the existing nodes.
class Program
{
static void Main(string[] args)
{
var output = new XmlDocument();
output.AppendChild(output.CreateXmlDeclaration("1.0", "utf-8", null));
var xmlns = new XmlNamespaceManager(output.NameTable);
var root = output.CreateElement("ps", "report", "http://example.com/namespace/ps");
var list = output.CreateElement("ps", "results", "http://example.com/namespace/ps");
list.Attributes.Append(output.CreateAttribute("Identifikation"));
list.Attributes[0].Value = "999999";
root.AppendChild(list);
var item = output.CreateElement("ps", "result", "http://example.com/namespace/ps");
item.Attributes.Append(output.CreateAttribute("id"));
item.Attributes[0].Value = "11125";
list.AppendChild(item);
//TODO: Append more validation messages.
output.AppendChild(root);
output.WriteTo(new XmlTextWriter(Console.Out));
System.Console.ReadLine();
}
}
Produces a document like this:
<?xml version="1.0" encoding="utf-8"?>
<ps:report xmlns:ps="http://example.com/namespace/ps">
<ps:results Identifikation="999999">
<ps:result id="11125" />
</ps:results>
</ps:report>

"Root element is missing" exception given when trying to parse XML file

I'm trying to set up parsing for a test XML generated with ksoap2 in Android:
<?xml version="1.0" encoding="utf-8"?>
<v:Envelope xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns:d="http://www.w3.org/2001/XMLSchema" xmlns:c="http://schemas.xmlsoap.org/soap/encoding/" xmlns:v="http://schemas.xmlsoap.org/soap/envelope/">
<v:Header />
<v:Body>
<v:SOAPBODY>
<v:INFO i:type="v:INFO">
<v:LAITETUNNUS i:type="d:string">EI_TUNNUSTA</v:LAITETUNNUS>
</v:INFO>
<v:TOIMINNOT i:type="v:TOIMINNOT">
<v:TOIMINTA i:type="d:string">ASETUKSET_HAKU</v:TOIMINTA>
</v:TOIMINNOT>
<v:SISALTO i:type="v:SISALTO">
<v:KUVA i:type="d:string">AGFAFDGFDGFG</v:KUVA>
<v:MITTAUS i:type="d:string">12,42,12,4,53,12</v:MITTAUS>
</v:SISALTO>
</v:SOAPBODY>
</v:Body>
</v:Envelope>
But seemingly i can't parse it in any way. The exception is always that "Root element is not found" even when it goes through XML-validators like the one at w3schools. If i'm correct the contents of the body shouldn't be an issue when the problem is with root element.
The test code for parsing i try to use in C# is:
using (StreamReader streamreader = new StreamReader(Context.Request.InputStream))
{
try
{
XDocument xmlInput = new XDocument();
streamreader.BaseStream.Position = 0;
string tmp = streamreader.ReadToEnd();
var xmlreader = XmlReader.Create(streamreader.BaseStream);
xmlInput = XDocument.Parse(tmp);
xmlInput = XDocument.Load(xmlreader);
catch (Exception e)
{ }
where the xmlInput = XDocument.Parse(tmp); does indeed parse it to a XDocument, not a navigable one, though. Then xmlInput = XDocument.Load(xmlreader); throws the exception for not having a root element. I'm completely at loss here because i managed to parse and navigate the almost same xml with XMLDocument and XDocument classes before, and i fear i made some changes i didn't notice.
Thanks in advance.
Update: Here's the string tmp as requested :
"<?xml version=\"1.0\" encoding=\"utf-8\"?><v:Envelope xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:d=\"http://www.w3.org/2001/XMLSchema\" xmlns:c=\"http://schemas.xmlsoap.org/soap/encoding/\" xmlns:v=\"http://schemas.xmlsoap.org/soap/envelope/\"><v:Header /><v:Body><v:SOAPBODY><v:INFO i:type=\"v:INFO\"><v:LAITETUNNUS i:type=\"d:string\">EI_TUNNUSTA</v:LAITETUNNUS></v:INFO><v:TOIMINNOT i:type=\"v:TOIMINNOT\"><v:TOIMINTA i:type=\"d:string\">ASETUKSET_HAKU</v:TOIMINTA></v:TOIMINNOT><v:SISALTO i:type=\"v:SISALTO\"><v:KUVA i:type=\"d:string\">AGFAFDGFDGFG</v:KUVA><v:MITTAUS i:type=\"d:string\">12,42,12,4,53,12</v:MITTAUS></v:SISALTO></v:SOAPBODY></v:Body></v:Envelope>\r\n"
Update: Even with XDocument.Load(new StreamReader(Context.Request.InputStream, Encoding.UTF8)); the parsing will fail.

I believe you've read to the end of the stream once already, you need to reset the position in the stream again. see: "Root element is missing" error but I have a root element

Remove comments from an XML file with double dashes --

How can I remove invalid xml comments that contain double dashes(--) from an xml file?
I'm trying to load the xml file, but it is failing. These comments make the xml invalid. The xml comes from a vendor.
I tried removing these based on approaches from other posts, but I was not successful. Here is an example of the xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!--MAIN VARIABLES-->
<content type="screwed">
<!--KEEP 19-39 -- SEE HELP.TXT AND THE VIDEO TUTORIALS FOR MORE INFO -->
<!--REGULAR/NON-Regular EXAMPLE --><SomeTag somefile="test.txt3" Name="test"/>
<!-- -->
</content>
I have tried the following without success:
string xmlDocFile = "c:\server\test.xml";
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.IgnoreComments = true;
readerSettings.ProhibitDtd = false;
readerSettings.ValidationType = ValidationType.DTD;
XmlReader reader = XmlReader.Create(xmlDocFile, readerSettings);
XmlDocument myXmlDoc = new XmlDocument();
myXmlDoc.Load(reader);
myXmlDoc.Save(xmlDocFile);

Before using XmlReader, parse xml file and filter comments using regexp.
// using System.Text.RegularExpressions;
System.IO.StreamReader file= new System.IO.StreamReader(xmlDocFile);
string validXml = Regex.Replace(file.ReadToEnd(),"<!--.*?-->","");
XmlReader reader = XmlReader.Create(validXml);

How to read processing instruction from an XML file using .NET 3.5

How to check whether an Xml file have processing Instruction
Example
<?xml-stylesheet type="text/xsl" href="Sample.xsl"?>
<Root>
<Child/>
</Root>
I need to read the processing instruction
<?xml-stylesheet type="text/xsl" href="Sample.xsl"?>
from the XML file.
Please help me to do this.

How about:
XmlProcessingInstruction instruction = doc.SelectSingleNode("processing-instruction('xml-stylesheet')") as XmlProcessingInstruction;

You can use FirstChild property of XmlDocument class and XmlProcessingInstruction class:
XmlDocument doc = new XmlDocument();
doc.Load("example.xml");
if (doc.FirstChild is XmlProcessingInstruction)
{
XmlProcessingInstruction processInfo = (XmlProcessingInstruction) doc.FirstChild;
Console.WriteLine(processInfo.Data);
Console.WriteLine(processInfo.Name);
Console.WriteLine(processInfo.Target);
Console.WriteLine(processInfo.Value);
}
Parse Value or Data properties to get appropriate values.

How about letting the compiler do more of the work for you:
XmlDocument Doc = new XmlDocument();
Doc.Load(openFileDialog1.FileName);
XmlProcessingInstruction StyleReference =
Doc.OfType<XmlProcessingInstruction>().Where(x => x.Name == "xml-stylesheet").FirstOrDefault();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing wrongly formatted XML-SOAP with C# - c#

Related

XElement.Load() and "undeclared prefix" exception

How to working with xml nodes

"Root element is missing" exception given when trying to parse XML file

Remove comments from an XML file with double dashes --

How to read processing instruction from an XML file using .NET 3.5

Categories

Resources