How do I obtain the SOAP child node values of username (Gusion)?
I am using C# in the backend.
<?xml version="1.0" encoding="utf-8"?>
<soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:api="http://127.0.0.1/Integrics/Enswitch/API" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Body><api:some_api_call soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<username xsi:type="xsd:string">Gusion</username>
</api:some_api_call>
</soapenv:Body>
</soapenv:Envelope>
I have tried using this but it does not work
public static string SoapNodeValue(string xmlString)
{
string soapString = xmlString;
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(soapString);
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xdoc.NameTable);
nsmgr.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
nsmgr.AddNamespace("api", "http://127.0.0.1/Integrics/Enswitch/API");
nsmgr.AddNamespace("xsd", "http://www.w3.org/2001/XMLSchema");
nsmgr.AddNamespace("soapenv", "http://schemas.xmlsoap.org/soap/envelope/");
nsmgr.AddNamespace("encodingStyle", "http://schemas.xmlsoap.org/soap/encoding/");
return xdoc.SelectSingleNode("/soapenv:Envelope/soapenv:Body/api:some_api_call/username", nsmgr).InnerText;
}
After playing around with your code, I saw VS throws this error,
System.Xml.XPath.XPathException:
''/soapenv:Envelope/soapenv:Body/api:some_api_call/username' has an
invalid token.'
For whatever reason, you have input the string with some unexpected encoding.
Now, copy and paste my string, and it should work,
"/soapenv:Envelope/soapenv:Body/api:some_api_call/username"
To prove my point, I created a python program,
import charade
def detect(s):
try:
# check it in the charade list
if isinstance(s, str):
return charade.detect(s.encode())
# detecting the string
else:
return charade.detect(s)
# in case of error
# encode with 'utf -8' encoding
except UnicodeDecodeError:
return charade.detect(s.encode('utf-8'))
d1 = detect('/soapenv:Envelope/soapenv:Body/api:some_api_call/username')
print ("d1 is encoded as : ", d1)
d2 = detect('/soapenv:Envelope/soapenv:Body/api:some_api_call/username')
print ("d2 is encoded as : ", d2)
And the result,
d1 is encoded as : {'encoding': 'utf-8', 'confidence': 0.7525}
d2 is encoded as : {'encoding': 'ascii', 'confidence': 1.0}
Where d1 is the problem string.
I just love to dig it into more and finally found the culprit, there is an invisible char U+200c in your string and it has a meaning!
Related
Question
Should whitespace be ignored at the beginning of my multi-line string literal xml?
Code
string XML = #"
<?xml version=""1.0"" encoding=""utf-8"" ?>"
using (StringReader stringReader = new StringReader(XML))
using (XmlReader xmlReader = XmlReader.Create(stringReader,
new XmlReaderSettings() { IgnoreWhitespace = true }))
{
xmlReader.MoveToContent();
// further implementation withheld
}
Notice in the above code that there is white space before the XML declaration, this doesn't seem to be being ignored despite my setting of the IgnoreWhiteSpace property. Where am I going wrong?!
Note: I have the same behaviour when the XML string does not have a line break, and just a whitespace, as below. I know this will run if I remove the whitespace, my question is as to why the property doesn't take care of this?
string XML = #" <?xml version=""1.0"" encoding=""utf-8"" ?>"
The documentations say that the IgnoreWhitespace property will "Gets or sets a value indicating whether to ignore insignificant white space.". While that first whitespace (and also linebreak) should be insignificant, the one who made XmlReader apparently didn't think so. Just trim XML before use, and you'll be fine.
As stated in comments and for clarity, change your code to:
string XML = #"<?xml version=""1.0"" encoding=""utf-8"" ?>"
using (StringReader stringReader = new StringReader(XML.Trim()))
using (XmlReader xmlReader = XmlReader.Create(stringReader,
new XmlReaderSettings() { IgnoreWhitespace = true }))
{
xmlReader.MoveToContent();
// further implementation withheld
}
According to Microsoft's documentation regarding XML Declaration
The XML declaration typically appears as the first line in an XML
document. The XML declaration is not required, however, if used it
must be the first line in the document and no other content or white
space can precede it.
The parse should fail for your code because white space precedes the XML declaration. Removing either the white space OR the xml declaration will result in a successful parse.
In other words it would be a bug if XmlReaderSettings were at odds with the documentation for XML Declaration - it is defined behavior.
Here's some code demonstrating the above rules.
using System;
using System.Web;
using System.Xml;
using System.Xml.Linq;
public class Program
{
public static void Main()
{
//The XML declaration is not required, however, if used it must
// be the first line in the document and no other content or
//white space can precede it.
// here, no problem because this does not have an XML declaration
string xml = #"
<xml></xml>";
XDocument doc = XDocument.Parse(xml);
Console.WriteLine(doc.Document.Declaration);
Console.WriteLine(doc.Document);
//
// problem here because this does have an XML declaration
//
xml = #"
<?xml version=""1.0"" encoding=""utf-8"" ?><xml></xml>";
try
{
doc = XDocument.Parse(xml);
Console.WriteLine(doc.Document.Declaration);
Console.WriteLine(doc.Document);
} catch(Exception e) {
Console.WriteLine(e.Message);
}
}
}
Below is the sample xml,
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam&Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>
c#:
string xml = System.IO.File.ReadAllText(#"E:\Sample.xml");
xml = System.Text.RegularExpressions.Regex.Replace(xml, "<(?![_:a-z][-._:a-z0-9]*\b[^<>]*>)", "<");
XDocument doc = XDocument.Parse(xml);
i need to convert the special charecters (<,>,",',&) and i am using the above regex. but parse method throws an error. any help please how to resolve the issue
See your current code converts XML like this
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam&Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>
Whereas Parse is looking it like this
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam and Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>
and thus you should not be converting < to < but XML contains sam&Tim would not allow you to Parse it. thus you can use
xml = xml.Replace("&", " n ");//n or and or some other char or string you want
instead of
xml = System.Text.RegularExpressions.Regex.Replace(xml, "<(?![_:a-z][-._:a-z0-9]*\b[^<>]*>)", "<");
Hope this will help you to parse it.
You can give a try with:
string xml = System.IO.File.ReadAllText(#"E:\Sample.xml");
xml = ReplaceXMLEncodedCharacters(xml)
public string ReplaceXMLEncodedCharacters(string input)
{
const string pattern = #"&#(x?)([A-Fa-f0-9]+);";
MatchCollection matches = Regex.Matches(input, pattern);
int offset = 0;
foreach (Match match in matches)
{
int charCode = 0;
if (string.IsNullOrEmpty(match.Groups[1].Value))
charCode = int.Parse(match.Groups[2].Value);
else
charCode = int.Parse(match.Groups[2].Value, System.Globalization.NumberStyles.HexNumber);
char character = (char)charCode;
input = input.Remove(match.Index - offset, match.Length).Insert(match.Index - offset, character.ToString());
offset += match.Length - 1;
}
return input;
}
Your problem is that your original XML is not a valid XML document, because is contains an unescaped ampersand ('&') which is explicitly forbidden by the standard that says
The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section.
To make it valid, you must use & instead of a literal &. Trying to "correct" it is not practical and a totally bad idea in the general case, because you can never be sure, where in your XML & stands for a literal & and where it is part of an XML entity. If it were possible to distinguish these usages unambiguously, that rule could be embedded in XML parsers and we would not have to deal with it.
A valid, standard-conformant representation of your document would be
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam&Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>
I am trying to replace within a string
<?xml version="1.0" encoding="UTF-8"?>
<response success="true">
<output><![CDATA[
And
]]></output>
</response>
with nothing.
The problem I am running into is the characters <> and " characters are interacting within the replace. Meaning, it's not reading those lines as a full string all together as one but breaking the string when it comes to a <> or ". Here is what I have but I know this isn't right:
String responseString = reader.ReadToEnd();
responseString.Replace(#"<<?xml version=""1.0"" encoding=""UTF-8""?><response success=""true""><output><![CDATA[[", "");
responseString.Replace(#"]]\></output\></response\>", "");
What would be the correct code to get the replace to see these lines as just a string?
A string will never change. The Replace method works as follows:
string x = "AAA";
string y = x.Replace("A", "B");
//x == "AAA", y == "BBB"
However, the real problem is how you handle the XML response data.
You should reconsider your approach of handling incoming XML by string replacement. Just get the CDATA content using the standard XML library. It's as easy as this:
using System.Xml.Linq;
...
XDocument doc = XDocument.Load(reader);
var responseString = doc.Descendants("output").First().Value;
The CDATA will already be removed. This tutorial will teach more about working with XML documents in C#.
Given your document structure, you could simply say something like this:
string response = #"<?xml version=""1.0"" encoding=""UTF-8""?>"
+ #"<response success=""true"">"
+ #" <output><![CDATA["
+ #"The output is some arbitrary text and it may be found here."
+ "]]></output>"
+ "</response>"
;
XmlDocument document = new XmlDocument() ;
document.LoadXml( response ) ;
bool success ;
bool.TryParse( document.DocumentElement.GetAttribute("success"), out success) ;
string content = document.DocumentElement.InnerText ;
Console.WriteLine( "The response indicated {0}." , success ? "success" : "failure" ) ;
Console.WriteLine( "response content: {0}" , content ) ;
And see the expected results on the console:
The response indicated success.
response content: The output is some arbitrary text and it may be found here.
If your XML document is a wee bit more complex, you can easily select the desired node(s) using an XPath query, thus:
string content = document.SelectSingleNode( #"/response/output" ).InnerText;
I have a xml-document that simplified looks like this:
<?xml version="1.0" encoding="utf-8"?>
<Node1 separator=" " />
There is a \t as attribute value.
When executing this code
var path = #"C:\test.xml";
var doc = XDocument.Load(path);
doc.Save(path);
the attribute value changed from tab to space.
<?xml version="1.0" encoding="utf-8"?>
<Node1 separator=" " />
Is there a way to preserve the origin value, because it is required to be a tab?
This is "XML whitespace normalization in attributes" portion of XML:Attribute-Value Normalization which is default behavior when handling XML documents.
For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value
You should be able to use XmlTextReader.Normalization property as described here. XmlDocument can load from reader XmlDocument.Load.
var path = #"C:\test.xml";
XmlDocument doc = new XmlDocument();
XmlTextReader reader = new XmlTextReader(path);
doc.Load(reader);
var s = doc.SelectSingleNode("*/#*").InnerText;
Console.WriteLine("|{0}|, {1}", (int)s[0], s.Length); // prints 9 - ASCII code of tab
doc.Save(path);
This question already has answers here:
Best way to encode text data for XML
(13 answers)
Closed 2 years ago.
I have a web service that returns an xml string as results. The return string is in this format:
<ReturnValue>
<ErrorNumber>0
</ErrorNumber>
<Message>my message</Message>
</ReturnValue>
The data that I want to insert into the "message" tag is a serialized version of a custom object. The serialized format of that object contains xml and namespace declarations post serialization. When that gets thrown into the "message" tag of my return xml string, XmlSpy says that it's not well-formed. How should I get rid of the namespace declarations, or is there a different way to imbed a serialized object into an xml string?
Wrap the string in CDATA like so:
<![CDATA[your xml, which can be multi-line]]>
CDATA will inform a validator to treat the CDATA contents as ignored text. It's often the most expedient way to embed XML (or taggy non-XML content) as a string. You can run into problems if your embedded XML contains its own CDATA, but otherwise it's a simple fix.
Just make sure that your <Message> XML is encoded so that <, >, ", and & show up as <, >, " and &, respectively.
There are few built-in ways to encode the characters:
string message = System.Web.HttpUtility.HtmlEncode(serializedXml);
string message = System.Security.SecurityElement.Escape(serializedXml);
Using an XmlTextWriter to do the work for you
Use CDATA to wrap your XML
Also, this is probably a duplicate of:
Best way to encode text data for XML
Think of XML as a document not a string.
Create a node named "wrapper", and store the content of your file in it as a Base64 encoded string. The results will look like this.
<ReturnValue>
<ErrorNumber>0</ErrorNumber>
<Message>my message</Message>
<wrapper type="bin.base64">PD94bWwgdmVyc2lvbj0iMS4wIj8+PHhzbDpzdHlsZXNoZWV0IHZ
lcnNpb249IjEuMCIgeG1sbnM6eHNsPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L1hTTC9UcmFuc2Zvcm0
iIHhtbG5zOm1zeHNsPSJ1cm46c2NoZW1hcy1taWNyb3NvZnQtY29tOnhzbHQiPjx4c2w6b3V0cHV0IG1
ldGhvZD0ieG1sIiAvPjx4c2w6dGVtcGxhdGUgbWF0Y2g9Ii8iPjwveHNsOnRlbXBsYXRlPjwveHNsOnN
0eWxlc2hlZXQ+</wrapper>
</ReturnValue>
The following code shows how to add the wrapper, encode the content. Then it reverses the process to show that it all "works".
Using Base64 in XML has a number of other applications as well. For example embedding images, or other documents in XML content.
using System;
using System.IO;
using System.Xml;
public class t
{
static public string EncodeTo64(string toEncode) {
byte[] toEncodeAsBytes = System.Text.ASCIIEncoding.ASCII.GetBytes(toEncode);
string returnValue = System.Convert.ToBase64String(toEncodeAsBytes);
return returnValue;
}
static public string DecodeFrom64(string encodedData) {
byte[] encodedDataAsBytes = System.Convert.FromBase64String(encodedData);
string returnValue = System.Text.ASCIIEncoding.ASCII.GetString(encodedDataAsBytes);
return returnValue;
}
public static void Main() {
try {
//Create the XmlDocument.
XmlDocument doc = new XmlDocument();
doc.LoadXml( #"
<ReturnValue>
<ErrorNumber>0</ErrorNumber>
<Message>my message</Message>
</ReturnValue>
");
XmlNode nodeMessage = doc.SelectSingleNode( "/ReturnValue/Message" );
if( nodeMessage != null ) {
XmlDocument docImport = new XmlDocument();
docImport.Load( "docwithnamespace.xml" );
// create a wrapper element for the file, then import and append it after <Message>
XmlElement nodeWrapper = (XmlElement)doc.CreateElement( "wrapper" );
nodeWrapper.SetAttribute( "type", "bin.base64" );
nodeWrapper = (XmlElement)doc.ImportNode( nodeWrapper, true );
XmlNode ndImport = nodeMessage.ParentNode.AppendChild( nodeWrapper.CloneNode( true ) );
ndImport.InnerText = EncodeTo64( docImport.OuterXml );
doc.Save( "wrapperadded.xml" );
// Next, let's test un-doing the wrapping
// Re-load the "wrapped" document
XmlDocument docSaved = new XmlDocument();
docSaved.Load( "wrapperadded.xml" );
// Get the wrapped element, decode from base64 write to disk
XmlNode node = doc.SelectSingleNode( "/ReturnValue/wrapper" );
if( node != null ) {
// Load the content, and save as a new XML
XmlDocument docUnwrapped = new XmlDocument();
docUnwrapped.LoadXml( DecodeFrom64( node.InnerText ) );
docUnwrapped.Save( "unwrapped.xml" );
Console.WriteLine( "Eureka" );
}
}
} catch( Exception e ) {
Console.WriteLine(e.Message);
}
}
}