C# strip away xml from a string

C# strip away xml from a string - c#

My xamarin mobile app consumes soap service, when i make a login request the response returned has both Json and xml. I am interested only in the json string. Can any one tell me the way to parse the following response.
[{"Result":"true","HasError":false,"UserMsg":null,"ErrorMsg":null,"TransporterID":"327f6da2-d797-e311-8a6f-005056a34fa8"}]
<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><LoginResponse xmlns="http://tempuri.org/" /></soap:Body></soap:Envelope>

You can use the Substring method as follows:
string response = "<<The response you got>>";
string jsonResponse = response.Substring(0, response.IndexOf("<?"));
The 0 is the starting index (when to start extracting the substring) and the IndexOf will return the index of <? which is the start of the XML part of the response. You can read about the Substring Method here.
Therefore, you have filtered the JSON response from the whole string. (For a how-to parse JSON data, check this answer).

String.IndexOf will give you the index of the xml part, then you can use String.Substring:
string str = #"[{ ""Result"":""true"",""HasError"":false,""UserMsg"":null,""ErrorMsg"":null,""TransporterID"":""327f6da2-d797-e311-8a6f-005056a34fa8""}]
<? xml version = ""1.0"" encoding = ""utf-8"" ?>< soap : Envelope xmlns: soap = ""http://www.w3.org/2003/05/soap-envelope"" xmlns: xsi = ""http://www.w3.org/2001/XMLSchema-instance"" xmlns: xsd = ""http://www.w3.org/2001/XMLSchema"" >< soap:Body >< LoginResponse xmlns = ""http://tempuri.org/"" /></ soap:Body ></ soap:Envelope >";
string json = str.Substring(0, str.IndexOf("<? xml"));
Console.WriteLine(json); // [{ "Result":"true","HasError":false,"UserMsg":D":"327f6da2-d797-e311-8a6f-005056a34fa8"}]

Related

C# - Getting XML Child Nodes From SOAP (application/soap+xml)

How do I obtain the SOAP child node values of username (Gusion)?
I am using C# in the backend.
<?xml version="1.0" encoding="utf-8"?>
<soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:api="http://127.0.0.1/Integrics/Enswitch/API" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Body><api:some_api_call soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<username xsi:type="xsd:string">Gusion</username>
</api:some_api_call>
</soapenv:Body>
</soapenv:Envelope>
I have tried using this but it does not work
public static string SoapNodeValue(string xmlString)
{
string soapString = xmlString;
XmlDocument xdoc = new XmlDocument();
xdoc.LoadXml(soapString);
XmlNamespaceManager nsmgr = new XmlNamespaceManager(xdoc.NameTable);
nsmgr.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
nsmgr.AddNamespace("api", "http://127.0.0.1/Integrics/Enswitch/API");
nsmgr.AddNamespace("xsd", "http://www.w3.org/2001/XMLSchema");
nsmgr.AddNamespace("soapenv", "http://schemas.xmlsoap.org/soap/envelope/");
nsmgr.AddNamespace("encodingStyle", "http://schemas.xmlsoap.org/soap/encoding/");
return xdoc.SelectSingleNode("/soapenv:Envelope/soap‌env:Body/api:some_api_call/username", nsmgr).InnerText;
}

After playing around with your code, I saw VS throws this error,
System.Xml.XPath.XPathException:
''/soapenv:Envelope/soap‌env:Body/api:some_api_call/username' has an
invalid token.'
For whatever reason, you have input the string with some unexpected encoding.
Now, copy and paste my string, and it should work,
"/soapenv:Envelope/soapenv:Body/api:some_api_call/username"
To prove my point, I created a python program,
import charade
def detect(s):
try:
# check it in the charade list
if isinstance(s, str):
return charade.detect(s.encode())
# detecting the string
else:
return charade.detect(s)
# in case of error
# encode with 'utf -8' encoding
except UnicodeDecodeError:
return charade.detect(s.encode('utf-8'))
d1 = detect('/soapenv:Envelope/soap‌env:Body/api:some_api_call/username')
print ("d1 is encoded as : ", d1)
d2 = detect('/soapenv:Envelope/soapenv:Body/api:some_api_call/username')
print ("d2 is encoded as : ", d2)
And the result,
d1 is encoded as : {'encoding': 'utf-8', 'confidence': 0.7525}
d2 is encoded as : {'encoding': 'ascii', 'confidence': 1.0}
Where d1 is the problem string.
I just love to dig it into more and finally found the culprit, there is an invisible char U+200c in your string and it has a meaning!

xml string change the header encoding using c#

I have string of XML .
how can I change the header from:
string xml = "<?xml version='1.0' encoding='ISO-8859-8'?>";
to
string xml = "<?xml version='1.0' encoding='UTF-8'?>";
using c#?
UPDATE
I tryed to get the xml to User object
XmlSerializer serializer = new XmlSerializer(typeof(User));
MemoryStream memStream = new MemoryStream(Encoding.UTF8.GetBytes(xml));
User user = (User)serializer.Deserialize(memStream);
but in the User object I get the string not encoding well.
because of the encoding of the Xml I need to change the encoding.

Instead of Encoding.UTF8.GetBytes use Encoding.GetEncoding("ISO-8859-8").GetBytes.

If the XML is stored in a string variable and you need to only replace the value in the encoding attribute, then you can perform a replace as following:
const string searchEncoding = "ISO-8859-8";
const string newEncoding = "UTF-8";
string xml = #"<?xml version='1.0' encoding='ISO-8859-8'?><abc></abc>";
int encodingPos = xml.IndexOf(searchEncoding);
if (encodingPos==30)
{
xml = xml.Substring(0, encodingPos) + newEncoding + xml.Substring(encodingPos + searchEncoding.Length);
}
However, a different process is necessary if the XML is stored in another datatype and/or you need to re-encode the XML content.

Xml Reading Issue using Xdocument

Below is the sample xml,
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam&Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>
c#:
string xml = System.IO.File.ReadAllText(#"E:\Sample.xml");
xml = System.Text.RegularExpressions.Regex.Replace(xml, "<(?![_:a-z][-._:a-z0-9]*\b[^<>]*>)", "<");
XDocument doc = XDocument.Parse(xml);
i need to convert the special charecters (<,>,",',&) and i am using the above regex. but parse method throws an error. any help please how to resolve the issue

See your current code converts XML like this
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam&Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>
Whereas Parse is looking it like this
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam and Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>
and thus you should not be converting < to < but XML contains sam&Tim would not allow you to Parse it. thus you can use
xml = xml.Replace("&", " n ");//n or and or some other char or string you want
instead of
xml = System.Text.RegularExpressions.Regex.Replace(xml, "<(?![_:a-z][-._:a-z0-9]*\b[^<>]*>)", "<");
Hope this will help you to parse it.

You can give a try with:
string xml = System.IO.File.ReadAllText(#"E:\Sample.xml");
xml = ReplaceXMLEncodedCharacters(xml)
public string ReplaceXMLEncodedCharacters(string input)
{
const string pattern = #"&#(x?)([A-Fa-f0-9]+);";
MatchCollection matches = Regex.Matches(input, pattern);
int offset = 0;
foreach (Match match in matches)
{
int charCode = 0;
if (string.IsNullOrEmpty(match.Groups[1].Value))
charCode = int.Parse(match.Groups[2].Value);
else
charCode = int.Parse(match.Groups[2].Value, System.Globalization.NumberStyles.HexNumber);
char character = (char)charCode;
input = input.Remove(match.Index - offset, match.Length).Insert(match.Index - offset, character.ToString());
offset += match.Length - 1;
}
return input;
}

Your problem is that your original XML is not a valid XML document, because is contains an unescaped ampersand ('&') which is explicitly forbidden by the standard that says
The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section.
To make it valid, you must use &amp instead of a literal &. Trying to "correct" it is not practical and a totally bad idea in the general case, because you can never be sure, where in your XML & stands for a literal & and where it is part of an XML entity. If it were possible to distinguish these usages unambiguously, that rule could be embedded in XML parsers and we would not have to deal with it.
A valid, standard-conformant representation of your document would be
<?xml version="1.0" encoding="utf-8"?>
<UsersList>
<User>
<Name>sam&Tim</Name>
<Address>21, bills street, CA</Address>
<Issues>"Issues1", "Issues2"</Issues>
</User>
</UsersList>

How to use replace with tricky characters in C#?

I am trying to replace within a string
<?xml version="1.0" encoding="UTF-8"?>
<response success="true">
<output><![CDATA[
And
]]></output>
</response>
with nothing.
The problem I am running into is the characters <> and " characters are interacting within the replace. Meaning, it's not reading those lines as a full string all together as one but breaking the string when it comes to a <> or ". Here is what I have but I know this isn't right:
String responseString = reader.ReadToEnd();
responseString.Replace(#"<<?xml version=""1.0"" encoding=""UTF-8""?><response success=""true""><output><![CDATA[[", "");
responseString.Replace(#"]]\></output\></response\>", "");
What would be the correct code to get the replace to see these lines as just a string?

A string will never change. The Replace method works as follows:
string x = "AAA";
string y = x.Replace("A", "B");
//x == "AAA", y == "BBB"
However, the real problem is how you handle the XML response data.
You should reconsider your approach of handling incoming XML by string replacement. Just get the CDATA content using the standard XML library. It's as easy as this:
using System.Xml.Linq;
...
XDocument doc = XDocument.Load(reader);
var responseString = doc.Descendants("output").First().Value;
The CDATA will already be removed. This tutorial will teach more about working with XML documents in C#.

Given your document structure, you could simply say something like this:
string response = #"<?xml version=""1.0"" encoding=""UTF-8""?>"
+ #"<response success=""true"">"
+ #" <output><![CDATA["
+ #"The output is some arbitrary text and it may be found here."
+ "]]></output>"
+ "</response>"
;
XmlDocument document = new XmlDocument() ;
document.LoadXml( response ) ;
bool success ;
bool.TryParse( document.DocumentElement.GetAttribute("success"), out success) ;
string content = document.DocumentElement.InnerText ;
Console.WriteLine( "The response indicated {0}." , success ? "success" : "failure" ) ;
Console.WriteLine( "response content: {0}" , content ) ;
And see the expected results on the console:
The response indicated success.
response content: The output is some arbitrary text and it may be found here.
If your XML document is a wee bit more complex, you can easily select the desired node(s) using an XPath query, thus:
string content = document.SelectSingleNode( #"/response/output" ).InnerText;

How to correctly encode & in xml?

Im web-requsting an XML document. Xdocument.Load(stream) throws an exception because the XML contains &, and therefore expects ; like &.
I did read the stream to string and replaced & with &, but that broke all other correctly encoded special chars like ø.
Is there a simple way to encode all disallowed chars in the string before parsing to XDocument?

Try CDATA Sections in xml
A CDATA section can only be used in places where you could have a text node.
<foo><![CDATA[Here is some data including < , > or & etc) ]]></foo>

This kind of methods are not encouraged!! The reason lies in your question!
(replacing & by & turns > to &gt;)
The better suggestion apart from using regex is modifying your source code which is generating such uncoded XML.
I have come across (.NET) code that use 'string concat' to come up with XML! (Instead one should use XML-DOM)
If you have an access to modify the source code then better go head with that .. because encoding such half-encoded XML is not promised with perfection!

#espvar,
This is an input XML:
<root><child>nospecialchars</child><specialchild>data&data</specialchild><specialchild2>You.. & I in this beautiful world</specialchild2>data&</root>
And the Main function:
string EncodedXML = encodeWithCDATA(XMLInput); //Calling our Custom function
XmlDocument xdDoc = new XmlDocument();
xdDoc.LoadXml(EncodedXML); //passed
The function encodeWithCDATA():
private string encodeWithCDATA(string stringXML)
{
if (stringXML.IndexOf('&') != -1)
{
int indexofClosingtag = stringXML.Substring(0, stringXML.IndexOf('&')).LastIndexOf('>');
int indexofNextOpeningtag = stringXML.Substring(indexofClosingtag).IndexOf('<');
string CDATAsection = string.Concat("<![CDATA[", stringXML.Substring(indexofClosingtag, indexofNextOpeningtag), "]]>");
string encodedLeftPart = string.Concat(stringXML.Substring(0, indexofClosingtag+1), CDATAsection);
string UncodedRightPart = stringXML.Substring(indexofClosingtag+indexofNextOpeningtag);
return (string.Concat(encodedLeftPart, encodeWithCDATA(UncodedRightPart)));
}
else
{
return (stringXML);
}
}
Encoded XML (ie, xdDoc.OuterXml):
<root>
<child>nospecialchars</child>
<specialchild>
<![CDATA[>data&data]]>
</specialchild>
<specialchild2>
<![CDATA[>You.. & I in this beautiful world]]>
</specialchild2>
<![CDATA[>data&]]>
</root>
All I have used is, substring, IndexOf, stringConcat and recursive function call.. Let me know if you don't understand any part of the code.
The sample XML that I have provided possess data in the parent nodes as well, which is kind of HTML property .. ex: <div>this is <b>bold</b> text</div>.. and my code takes care of encoding data outside <b> tag if they have special character ie, &..
Please note that, I have taken care of encoding '&' only and .. data cannot have chars like '<' or '>' or single-quote or double-quote..

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# strip away xml from a string - c#

Related

C# - Getting XML Child Nodes From SOAP (application/soap+xml)

xml string change the header encoding using c#

Xml Reading Issue using Xdocument

How to use replace with tricky characters in C#?

How to correctly encode & in xml?

Categories

Resources