Strip Out XML from string

Strip Out XML from string - c#

I am calling some XMLRPC API and sometimes I am getting some "dirty" response. Dirty response means that along with XML is returned some other content/html like:
<div>Some Html maybe> Or some additional string is here
<?xml version="1.0" encoding="ISO-8859-1"?>
<methodResponse>
<params>
<param>
<value><int>30</int></value>
</param>
</params>
</methodResponse>
I need a way to throw out anything what is not XML and read only XML from string response so from response above I get only:
<?xml version="1.0" encoding="ISO-8859-1"?>
<methodResponse>
<params>
<param>
<value><int>30</int></value>
</param>
</params>
</methodResponse>
If nothing it would be helpful if someone provide code which cleans only HTML and left only XML at least. Prefer code in C#

Try using a variation of this or possibly use XSLT to filter the response you get back from the API with something similar to this. XSLT is actually pretty powerful stuff when filtering XML. I know Visual Studios didn't support XSLT V.2, but if you could use V.2 in another editor its quiet useful.

Related

Error when reading XML

I am currently writing an XML writer/reader. I have it writing to the xml file, now I am attempting to read from it. However, when I do so the following error is thrown and I am not sure why:
'>' is an unexpected token. The expected token is '='. Line 6, position 16. XML reader c#
Please could someone shed some light on this for me?
The XML file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<assignments>
<assignment>
<ModuleTitle>Internet Programming</ModuleTitle>
<AssignmentTitle>Assignment 01</AssignmentTitle>
<Date Given>11/02/2015</Date Given>
<Date Due>20/02/2015</Date Due>
</assignment>
</assignments>
UPDATE:
The problem was the fact that in some of my tag names I had spaces, which was causing the error.

You have invalid spaces, the following will work:
XElement config = XElement.Parse (
#"<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<assignments>
<assignment>
<ModuleTitle>Internet Programming</ModuleTitle>
<AssignmentTitle>Assignment 01</AssignmentTitle>
<DateGiven>11/02/2015</DateGiven>
<DateDue>20/02/2015</DateDue>
</assignment>
</assignments>");
Please note DateGiven and DateDuewithout spaces.
The spaces are the reason for the error as shown below:

<Date Given> is not a valid XML syntax. Given is supposed to be an attribute with a value, so it should look something like this: <Date Given="true">
Edit to be useful in the future: as #James mentioned, it is just a space in the tag name, which is also invalid in XML.

Check XML via XSD schemas which are specified in xsi:schemaLocation attribute

Sorry for my English.
C# 4.0, LINQ to XML.
I get XDocument from an XML file, for example:
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../../support/localization.xslt"?>
<doc:resources xmlns:doc="http://mea-orbis.com/2012/XMLSchema/localization"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://mea-orbis.com/2012/XMLSchema/localization ../../support/localization.xsd">
<!--Заголовки столбцов таблицы-->
<doc:record id="commandName">Команда</doc:record>
<doc:record id="commandNameDescript">Краткое описание</doc:record>
<doc:record id="commandNameNotes">Примечание</doc:record>
<!--******************************************-->
<!--Наименования групп команд-->
<doc:record id="group1">Команды смены кодировок</doc:record>
<!--******************************************-->
<!--Наименования команд, их краткое описание и примечания-->
<doc:record id="dwgconvertName">DWGCONVERT</doc:record>
<doc:record id="dwgconvertKeyWords">кодировка</doc:record>
<doc:record id="dwgconvertDescr">конвертация текущего чертежа (версии AutoCAD до 2011 включительно)</doc:record>
<doc:record id="dwgconvertcpName">DWGCONVERTCP</doc:record>
<doc:record id="dwgconvertcpKeyWords">кодировка</doc:record>
<doc:record id="dwgconvertcpDescr">конвертация текущего чертежа (версии AutoCAD с 2008)</doc:record>
<doc:record id="dwgconvertfilesName">DWGCONVERTFILES</doc:record>
<doc:record id="dwgconvertfilesKeyW">кодировка</doc:record>
<doc:record id="dwgconvertfilesDescr">конвертация выбранных пользователем чертежей</doc:record>
<doc:record id="dwgconvertstrName">DWGCONVERTSTR</doc:record>
<doc:record id="dwgconvertstrKeyW">кодировка</doc:record>
<doc:record id="dwgconvertstrDescr">
конвертация отдельного текстового примитива (примитивов)
из текущего чертежа
</doc:record>
<doc:record id="ns">DWGCONVERT</doc:record>
<doc:record id="arxload">Загрузка всех ARX файлов</doc:record>
<doc:record id="netload">Загрузка всех DLL файлов</doc:record>
</doc:resources>
I need to check XDocument for XSD schema validation. I found two examples in MSDN:
first, second.
But in the samples, the XSD schema is separate from the file. I don't want to do superfluous operations because these schemas are already specified in the xsi:schemaLocation attribute of my XML file.
What is the correct way to execute a check of object XDocument, in which all necessary schemas are already specified in the xsi:schemaLocation attribute?
Regards

This may be a little late, but I found this question, and then I found this answer elsewhere on Stack Overflow: Validating an XML against referenced XSD in C#. I just checked that it worked at least for a locally stored xsd.

Processing of xsi attributes for schema locations is not built in the framework; you will have to do that yourself.
The way I've done it involves the following steps:
reading schemaLocation or noNamespaceSchemaLocation attributes associated with your document root element. This is where you have to come up with your solution that best fits your needs; if you don't care about performance, then you can simply use the DOM based API - it may result in going over the source XML twice: once to parse it into memory, then again to validate it. Or, you use a fast, forward only reader to only read all the attributes of the root node, looking for your xsi: ones, then abandon the reading once past the root element.
Once found, you'll have to parse the attribute values; typically you invoke a string.Split() on whitespace (\t, \r, \n, 0x20), trimming all, discarding empties and making pairs (when namespaces are used). Ultimately, this gives you the list of URIs where your XSDs are located
For each URI, resolve it to an absolute URI, eventually converting any relative using the base absolute URI of your XML file
Build an XmlSchemaSet by adding all the XSDs; compile it and use it for validation by getting a reader from your source XML.

Parse three specific elements from an XML snippet in C# 2.0

How could parse the value of few tag from my XML using C# 2.0?
I want to parse the tag and their value like
1) <v9:Severity>SUCCESS</v9:Severity>
2) <v9:TrackingNumber>634649515000016</v9:TrackingNumber>
3) <v9:Image>iVBORw0KGgoAAAANSUhEUgAAAyAAAASwAQAAAAAryhMIAAAagEl</v9:Image>
How to get the value of those above elements programmatically with C# 2.0?
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<env:Header xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
<soapenv:Body>
<v9:ProcessShipmentReply xmlns:v9="http://fedex.com/ws/ship/v9">
<v9:HighestSeverity xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">SUCCESS</v9:HighestSeverity>
<v9:Notifications xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<v9:Severity>SUCCESS</v9:Severity>
<v9:Source>ship</v9:Source>
<v9:Code>0000</v9:Code>
<v9:Message>Success</v9:Message>
<v9:LocalizedMessage>Success</v9:LocalizedMessage>
</v9:Notifications>
<v9:CompletedShipmentDetail>
<v9:CompletedPackageDetails xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<v9:SequenceNumber>1</v9:SequenceNumber>
<v9:TrackingIds>
<v9:TrackingIdType>GROUND</v9:TrackingIdType>
<v9:TrackingNumber>634649515000016</v9:TrackingNumber>
</v9:TrackingIds>
<v9:Barcodes>
<v9:BinaryBarcodes>
<v9:Type>COMMON_2D</v9:Type>
<v9:Value>Wyk+HjAxHTAyMDI3ODAdODQwHTEzNx02MzQ2NDk1</v9:Value>
</v9:BinaryBarcodes>
<v9:StringBarcodes>
<v9:Type>GROUND</v9:Type>
<v9:Value>9612137634649515000016</v9:Value>
</v9:StringBarcodes>
</v9:Barcodes>
<v9:Label>
<v9:Type>OUTBOUND_LABEL</v9:Type>
<v9:ShippingDocumentDisposition>RETURNED</v9:ShippingDocumentDisposition>
<v9:Resolution>200</v9:Resolution>
<v9:CopiesToPrint>1</v9:CopiesToPrint>
<v9:Parts>
<v9:DocumentPartSequenceNumber>1</v9:DocumentPartSequenceNumber>
<v9:Image>iVBORw0KGgoAAAANSUhEUgAAAyAAAASwAQAAAAAryhMIAAAagEl</v9:Image>
</v9:Parts>
</v9:Label>
</v9:CompletedPackageDetails>
</v9:CompletedShipmentDetail>
</v9:ProcessShipmentReply>
</soapenv:Body>
</soapenv:Envelope>

Since you said that you use c# 2.0 (and, thus, cannot use LINQ-to-XML), the easiest way to just find single values out of your XML would be to use XPath:
You can use an XPathNavigator (MSDN: Select XML Data using XPathNavigator)
or you can use XmlNode.SelectNodes directly.
Since your XML contains namespaces <v9:...>, the issue gets a bit more complicated: You need to initialize an XmlNamespaceManager and pass it to the XPathNavigator. Here is a blog post that explains this issue in detail; an example can also be found at the XmlNode.SelectNodes MSDN page (see link above).
Query XML with Namespaces using XPathNavigator

XML DocumentElement is trashing the innerXml

I have a simple XML file, shown below, which when read-in via a basic XmlDocument.Load(filename.xml). If I load the file, and inspect it's innerXML, it all looks normal. However, when I inspect the value of DocumentElement, it's a mess!!! I kept the example small, so you can easily see there is no mal-formation:
<?xml version="1.0" encoding="UTF-8"?>
<fax:FaxService xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceDefaults>
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat>MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr>false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP>false</dd:AutoCompleteToNANP>
<dd:RetryInterval>0</dd:RetryInterval>
<dd:MaxRetryAttempts>0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>
</fax:FaxService>
Now, try this in C# with this simple code:
...
XmlDocument xDoc = new XmlDocument();
xDoc.Load("*XMLSAMPLE.XML*");
textBox1.Text = xDoc.InnerXml;
textBox2.Text = xDoc.DocumentElement.InnerXml;
...
It's completely mangled, with the 2nd namespace repeated with every dd tag, and not even included in the top-most tag.
What am I doing wrong? This is driving me nuts!

The content returned by xDoc.DocumentElement.InnerXml is semantically identical to your original ServiceDefaults tag - if the first fragment conforms to your XML schema, the InnerXml fragment will also conform to the definition of the inner element. Just because the framework has re-arranged the namespace declarations does not change the semantics of the document.
Compare the output of your the two XmlDocument properties:
xDoc.DocumentElement:
<?xml version="1.0" encoding="UTF-8"?>
<fax:FaxService xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceDefaults>
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat>MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr>false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP>false</dd:AutoCompleteToNANP>
<dd:RetryInterval>0</dd:RetryInterval>
<dd:MaxRetryAttempts>0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>
</fax:FaxService>
xDoc.DocumentElement.InnerXml:
<fax:ServiceDefaults xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/">
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">false</dd:AutoCompleteToNANP>
<dd:RetryInterval xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">0</dd:RetryInterval>
<dd:MaxRetryAttempts xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>

A look at the following link in MSDN will help shed light on your situation:
http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.innerxml.aspx
Basically, xDoc.DocumentElement.InnerXml is looking at the <fax:ServiceDefaults> node, whereas xDoc.InnerXml is looking one level higher (FaxService node). This is crucial to understanding your problem - because all of your xmlns is on the FaxService node.
Make the following change to your XML document, and notice what happens (basically, copy over the xmlns info to the ServiceDefaults node:
<?xml version="1.0" encoding="UTF-8"?>
<fax:FaxService xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceDefaults xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat>MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr>false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP>false</dd:AutoCompleteToNANP>
<dd:RetryInterval>0</dd:RetryInterval>
<dd:MaxRetryAttempts>0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>
</fax:FaxService>
Suddenly your code will behave according to your expectations. So hopefully this helps you towards understanding the issue. What the permanent fix should be, that's up to you.
HTH!

XML tag meaning

I have a part of xml file
<Text><?xml version="1.0" encoding="utf-16"?>
<ObjectFilter xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<FilterConditions>
<FilterCondition>
<PropertyFilters>
<PropertyFilter>
<PropertyName>Message</PropertyName>
<FilterValue xsi:type="xsd:string">PPM exceeds tolerance</FilterValue>
<FilterType>strExpr</FilterType>
<Operator>eq</Operator>
<CaseSensitive>true</CaseSensitive>
<Recursive>false</Recursive>
</PropertyFilter>
</PropertyFilters>
<Enabled>true</Enabled>
<ObjectTypeName>Spo.DataModel.UnixLogMessage</ObjectTypeName>
<ObjectClassGR>
<Guid>00000000-0000-0000-0000-000000000000</Guid>
</ObjectClassGR>
Here what is node Recursive meant,,it actually like this <Recursive>false</Recursive>
but how come it like &lt ;Recursive>false&lt ;/Recursive >
Can any one help me about this

How are you getting this XML file? From a webpage?
It seems that the way you are getting the text file is translating it as an HTML document and thus turning your '<' into &lt and your '>' into &gt
You need to ensure that the page is not interpreted as HTML. You could just copy-paste everything into Notepad first for a simple solution.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Strip Out XML from string - c#

Try using a variation of this or possibly use XSLT to filter the response you get back from the API with something similar to this. XSLT is actually pretty powerful stuff when filtering XML. I know Visual Studios didn't support XSLT V.2, but if you could use V.2 in another editor its quiet useful.

Related

Error when reading XML

Check XML via XSD schemas which are specified in xsi:schemaLocation attribute

Parse three specific elements from an XML snippet in C# 2.0

XML DocumentElement is trashing the innerXml

XML tag meaning

Categories

Resources