I'm trying to set up parsing for a test XML generated with ksoap2 in Android:
<?xml version="1.0" encoding="utf-8"?>
<v:Envelope xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns:d="http://www.w3.org/2001/XMLSchema" xmlns:c="http://schemas.xmlsoap.org/soap/encoding/" xmlns:v="http://schemas.xmlsoap.org/soap/envelope/">
<v:Header />
<v:Body>
<v:SOAPBODY>
<v:INFO i:type="v:INFO">
<v:LAITETUNNUS i:type="d:string">EI_TUNNUSTA</v:LAITETUNNUS>
</v:INFO>
<v:TOIMINNOT i:type="v:TOIMINNOT">
<v:TOIMINTA i:type="d:string">ASETUKSET_HAKU</v:TOIMINTA>
</v:TOIMINNOT>
<v:SISALTO i:type="v:SISALTO">
<v:KUVA i:type="d:string">AGFAFDGFDGFG</v:KUVA>
<v:MITTAUS i:type="d:string">12,42,12,4,53,12</v:MITTAUS>
</v:SISALTO>
</v:SOAPBODY>
</v:Body>
</v:Envelope>
But seemingly i can't parse it in any way. The exception is always that "Root element is not found" even when it goes through XML-validators like the one at w3schools. If i'm correct the contents of the body shouldn't be an issue when the problem is with root element.
The test code for parsing i try to use in C# is:
using (StreamReader streamreader = new StreamReader(Context.Request.InputStream))
{
try
{
XDocument xmlInput = new XDocument();
streamreader.BaseStream.Position = 0;
string tmp = streamreader.ReadToEnd();
var xmlreader = XmlReader.Create(streamreader.BaseStream);
xmlInput = XDocument.Parse(tmp);
xmlInput = XDocument.Load(xmlreader);
catch (Exception e)
{ }
where the xmlInput = XDocument.Parse(tmp); does indeed parse it to a XDocument, not a navigable one, though. Then xmlInput = XDocument.Load(xmlreader); throws the exception for not having a root element. I'm completely at loss here because i managed to parse and navigate the almost same xml with XMLDocument and XDocument classes before, and i fear i made some changes i didn't notice.
Thanks in advance.
Update: Here's the string tmp as requested :
"<?xml version=\"1.0\" encoding=\"utf-8\"?><v:Envelope xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:d=\"http://www.w3.org/2001/XMLSchema\" xmlns:c=\"http://schemas.xmlsoap.org/soap/encoding/\" xmlns:v=\"http://schemas.xmlsoap.org/soap/envelope/\"><v:Header /><v:Body><v:SOAPBODY><v:INFO i:type=\"v:INFO\"><v:LAITETUNNUS i:type=\"d:string\">EI_TUNNUSTA</v:LAITETUNNUS></v:INFO><v:TOIMINNOT i:type=\"v:TOIMINNOT\"><v:TOIMINTA i:type=\"d:string\">ASETUKSET_HAKU</v:TOIMINTA></v:TOIMINNOT><v:SISALTO i:type=\"v:SISALTO\"><v:KUVA i:type=\"d:string\">AGFAFDGFDGFG</v:KUVA><v:MITTAUS i:type=\"d:string\">12,42,12,4,53,12</v:MITTAUS></v:SISALTO></v:SOAPBODY></v:Body></v:Envelope>\r\n"
Update: Even with XDocument.Load(new StreamReader(Context.Request.InputStream, Encoding.UTF8)); the parsing will fail.
I believe you've read to the end of the stream once already, you need to reset the position in the stream again. see: "Root element is missing" error but I have a root element
Related
I'm trying to load xml file using Xelement.Load() method and in case of some files, I get "ditaarch" is an undeclared prefix exception. The content of such troublesome xml's are similar to this simplified version:
<?xml version="1.0" encoding="UTF-8"?>
<concept ditaarch:DITAArchVersion="1.3">
<title>Test Title</title>
<menucascade>
<uicontrol>text</uicontrol>
<uicontrol/>
</menucascade>
</concept>
I've tried to follow suggestions to manually add or ignore "ditaarch" namespace using xml namespace manager:
using (XmlReader reader = XmlReader.Create(#"C:\test\example.xml"))
{
NameTable nameTable = new NameTable();
XmlNamespaceManager nameSpaceManager = new XmlNamespaceManager(nameTable);
nameSpaceManager.AddNamespace("ditaarch", "");
XmlParserContext parserContext = new XmlParserContext(null, nameSpaceManager, null, XmlSpace.None);
XElement elem = XElement.Load(reader);
}
But it leads to same exception as before. Most probably the solution is trivial but I just can't see it :(
If anyone would be able to point me in the right direction, I would be most grateful.
The presented markup is not namespace well-formed XML so I don't think XElement or XDocument is an option as it doesn't support colons in names. You can parse it with a legacy new XmlTextReader("foo.xml") { Namespaces = false } however.
And you could use XmlDocument instead of XDocument or XElement and check for any empty elements with e.g.
XmlDocument doc = new XmlDocument();
using (XmlReader xr = new XmlTextReader("example.xml") { Namespaces = false })
{
doc.Load(xr);
}
Console.WriteLine("Number of empty elements: {0}", doc.SelectNodes("//*[not(*)][not(normalize-space())]").Count);
I have malformed XML (SOAP) file which I need to parse. The issue is that XML doesn't have proper header tags.
I've tried to parse file with XDocument and XmlDocument but neither has worked. XML starts from the line 30, so maybe there is some way to skip those lines before file is read by XML parser?
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:eb="http://www.oasis-open.org/committees/ebxml-msg/schema/msg-header-2_0.xsd">
<SOAP-ENV:Header>
</SOAP-ENV:Header>
<SOAP-ENV:Body>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="Finvoice.xsl"?>
<GGVersion="2.01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="a.xsd">
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;
XmlReader r = XmlReader.Create(file.FullName, settings);
XmlDocument xDoc = new XmlDocument();
xDoc.PreserveWhitespace = true;
xDoc.LoadXml("<xml/>");
xDoc.DocumentElement.CreateNavigator().AppendChild(r);
XmlNamespaceManager manager = new XmlNamespaceManager(xDoc.NameTable);
Once trying to parse I get: Unexpected xml declaration. The xml declaration must be the first node in the document ....
If I understand you correctly, then the data you are looking for starts after the SOAP envelope. There is no garbage/unnessescary contents after the data you are looking for.
The SOAP header does not start with the XML declaration (<?xml version=, etc).
Looking for the start of the document
A simple solution is to find the start of the XML document (the data you are looking for), and chop away everything before that.
var startOfRealDocumentMarker = "<?xml version=\"1.0\"";
var startIndex = dirtyXmlString.IndexOf(startOfRealDocumentMarker);
if(startIndex == -1) {
throw new Exception("Start of XML not found. Now what?");
}
var cleanXmlString = dirtyXmlString.Substring(startIndex);
If the SOAP header also has an XML declaration, you could look for the end-tag of the SOAP envelope instead. Or you could start looking for the declaration at the 2nd character, so you would skip over the first one.
This is obviously not a fool-proof solution that will work in every case. But maybe it will work in all of your cases?
Skipping lines
If you're sure it will work to always start reading from line 30 of the input file, you can use this method instead.
XmlDocument xDoc = new XmlDocument();
using (var rdr = new StreamReader(pathToXmlFile))
{
// Skip until reader is positioned at start of line 30
for (var i = 0; i < 29; ++i)
{
rdr.ReadLine();
}
// Load document from current position of reader
xDoc.Load(rdr);
}
the xml that i get via a response stream:
<?xml version="1.0" encoding="utf-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<generateSSOResponse xmlns="http://url.com">
<generateSSOReturn>2DKtjZNq58THggh42lNsGvgGTjF8RSBA</generateSSOReturn>
</generateSSOResponse>
</soapenv:Body>
</soapenv:Envelope>
The code is use to try and get the "generateSSOResponse" token value.
var xmlDoc = XElement.Parse(s);
var ssoToken = xmlDoc.XPathSelectElement("/soapenv:Envelope[#xmlns:soapenv=\"http://schemas.xmlsoap.org/soap/envelope/\"]/soapenv:Body/generateSSOResponse[#xmlns=\"http://ws.configureone.com\"]/generateSSOReturn");
Error: Namespace Manager or XsltContext needed. This query has a
prefix, variable, or user-defined function.
Saying i need a namespace manager? i though that's when dealing with XMLdoc not xElement? Whats the solution here?
EDIT: variable "s" is the response stream code as :
using (var mem = new MemoryStream())
{
rstream.CopyTo(mem);
var b = mem.ToArray();
var s = System.Text.Encoding.UTF8.GetString(b);
Honestly, it'd be far simpler to use LINQ to XML as it was intended:
XNamespace ns = "http://url.com";
var token = (string)doc.Descendants(ns + "generateSSOReturn").Single();
See this fiddle for a working example. If you did want to use XPath then yes, you would need a namespace manager to allow the XPath navigator to resolve all the prefixes in your expression.
As an aside, you could also parse your XML direct from the stream:
var doc = XDocument.Load(rstream);
Ok so Charles Mager gave an answer using XMLtoLINQ as I was trying to use Xelement. However it turns out the ERP the code is being embedded into doesn't support linq (bummer).
So here's the solution i got working without XMLtoLINQ:
XmlDocument mydoc = new XmlDocument();
XmlNamespaceManager manager = new XmlNamespaceManager(mydoc.NameTable);
manager.AddNamespace("soapenv", "http://schemas.xmlsoap.org/soap/envelope/");
manager.AddNamespace("xsd", "http://www.w3.org/2001/XMLSchema");
manager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
manager.AddNamespace("rsp","http://url.com");
mydoc.LoadXml(s);
var mytoken = mydoc.SelectSingleNode("//rsp:generateSSOReturn", manager);
Hope this helps anyone else who is in the same predicament as I was.
I have a simple XML file:
<?xml version="1.0" encoding="utf-8" ?>
<Config>
<NumOfBytesInRow>20</NumOfBytesInRow>
<FirstBaudRate>115200</FirstBaudRate>
<SecondBaudRate>34800</SecondBaudRate>
<DefaultPort>COM1</DefaultPort>
<NumOfTries>2</NumOfTries>
</Config>
And I'm trying to get the elements, but as soon as I'm opening the file I'm getting an exception that the root element is missing
XDocument doc = new XmlDocument();
doc.Load(path);
EDIT
I have added:
if(File.Exists("D:\\BBConfig.xml"))
before the load it found the file and still same error
For the first I find the answer of user3890766 very good: "This exception could be thrown if the method can't find the file". But nevertheless you can try this for sure:
string strXml;
try
{
using (StreamReader sr = new StreamReader("myXML.xml"))
{
strXml = sr.ReadToEnd();
}
XmlDocument doc = new XmlDocument();
doc.LoadXml(strXml);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
This exception could be thrown if the method can't find the file. You need to check if your application can find the file at the given path, and have the authorization to read it.
To be sure, you could use a Stream, and check the Length. Then use XmlDocument.Load with this Stream.
I am currently converting our old parsers that run on XmlDocument to the XDocument. I do this mainly to get the Linq querying and the added linenumber info.
My xml contains an element like this:
<?xml version="1.0"?>
<fulltext>
hello this is a failed textnode
and I don't know how to parse it.
</fulltext>
My problem is that while XmlDocument seems to have no problem reading that node with:
var xmlDocument = new XmlDocument();
var physicalPath = GetPhysicalPath(uploadFolderFile);
try
{
xmlDocument.Load(physicalPath);
}
catch (XmlException xmlException)
{
_log.Warn("Problems with the document", xmlException);
}
The example above parses the document fine but when I try to do:
XDocument xmlDocument;
var physicalPath = GetPhysicalPath(uploadFolderFile);
var xmlStream = new System.IO.StreamReader(physicalPath);
try
{
xmlDocument = XDocument.Load(xmlStream, LoadOptions.SetLineInfo | LoadOptions.SetBaseUri);
}
catch (XmlException)
{
_log.Warn("Trying to clean document for HexaDecimal", xmlException);
}
It fails to read the document because of the character
The special character seems to be allowed in XML version 1.1 but changing the description doesn't help.
I have thought about just parsing the document with XmlDocument and then converting it; but that seems to be counterintuitive. Can anybody help with this problem?
Ok...so I sort of found a solution to this problem.
First of all I try to parse the xml using the following code:
private XDocument GetXmlDocument(String physicalPath)
{
XDocument xmlDocument;
var xmlStream = new System.IO.StreamReader(physicalPath);
try
{
xmlDocument = XDocument.Load(xmlStream, LoadOptions.SetLineInfo);
}
catch (XmlException)
{
//_log.Warn("Trying to clean document for HexaDecimal", xmlException);
xmlDocument = XmlSanitizingStream.TryToCleanXMLBeforeParsing(physicalPath);
}
return xmlDocument;
}
If it fails to load the document, then I will try to clean it using the technique used in this blogpost:
http://seattlesoftware.wordpress.com/2008/09/11/hexadecimal-value-0-is-an-invalid-character/
It will not remove the character I mentioned before, but it will remove any character not allowed by the XML standard.
Then, after sanitizing the XML, I add an XMLReader and set its settings to not check characters:
public static XDocument TryToCleanXMLBeforeParsing(String physicalPath)
{
string xml;
Encoding encoding;
using (var reader = new XmlSanitizingStream(File.OpenRead(physicalPath)))
{
xml = reader.ReadToEnd();
encoding = reader.CurrentEncoding;
}
byte[] encodedString;
if (encoding.Equals(Encoding.UTF8)) encodedString = Encoding.UTF8.GetBytes(xml);
else if (encoding.Equals(Encoding.UTF32)) encodedString = Encoding.UTF32.GetBytes(xml);
else encodedString = Encoding.Unicode.GetBytes(xml);
var ms = new MemoryStream(encodedString);
ms.Flush();
ms.Position = 0;
var settings = new XmlReaderSettings {CheckCharacters = false};
XmlReader xmlReader = XmlReader.Create(ms, settings);
var xmlDocument = XDocument.Load(xmlReader);
ms.Close();
return xmlDocument;
}
Since I've cleaned the document removing illegal characters before I add the ignore characters to the reader, I am pretty sure that I do not read a malformed XML document. Worst case scenario is I get a malformed XML and it will throw an error anyways.
I only use this for parsing and it should only be used to read the data. This will not make the XML well-formed and will in many cases throw exceptions elsewhere in your code. I am only using this because I cannot change what the customer is sending us and I have to read it as is.