Ampersand in XML url is not passed through - c#

I'm having hard-time fixing this little problem of ampersand (&) in the url... I'm serializing XML as shown below...
var ser = new XmlSerializer(typeof(response));
using (var reader = XmlReader.Create(url))
{
response employeeResults = (response)ser.Deserialize(reader); //<<error when i pass with ampersand
}
the above codes works fine if there is no & in the url otherwise it throws me an error (see below)
i have no problem serializing this url:
http://api.host.com/api/employees.xml/?&search=john
I'm having problem with this url:
http://api.host.com/api/employees.xml/?&max=20&page=10
The error i'm getting is:
`There is an error in XML document (1, 389).`
PS: I did tried passing & and also tried with &#38 or #026 or & - no luck.

This XML isn't well-formed:
<?xml version="1.0"?>
<response xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/Api">
<meta>
<status>200</status>
<message />
<resultSet>
<Checked>true</Checked>
</resultSet>
<pagination>
<count>1</count>
<page>1</page>
<max>1</max>
<curUri>http://api.host.com/employee.xml/?&max=5</curUri>
<prevUri i:nil="true"/>
<nextUri>http://api.host.com/employee.xml/?&max=5&page=2</nextUri>
</pagination>
</meta>
<results i:type="ArrayOfemployeeItem">
<empItem>
<Id>CTR3242</Id>
<name>john</name>
......
</empItem>
</results>
</response>
You must escape & character or put entire string in CDATA, e.g.:
<?xml version="1.0"?>
<response xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/Api">
<meta>
<status>200</status>
<message />
<resultSet>
<Checked>true</Checked>
</resultSet>
<pagination>
<count>1</count>
<page>1</page>
<max>1</max>
<curUri><![CDATA[http://api.host.com/employee.xml/?&max=5]]></curUri>
<prevUri i:nil="true"/>
<nextUri><![CDATA[http://api.host.com/employee.xml/?&max=5&page=2]]></nextUri>
</pagination>
</meta>
<results i:type="ArrayOfemployeeItem">
<empItem>
<Id>CTR3242</Id>
<name>john</name>
......
</empItem>
</results>
</response>
If you are dealing with some third-party system and not able to get proper XML response, you have to do some pre-processing.
Maybe the simplest way is just replace all & with & using string.Replace method.
Or use this regex &(?!amp;) to replace all & excluding correct ones like &.

Have you tried to wrap the Attribute with <![CDATA[yourAttribute]]> ?
& is not allowed in xml
deserialize-xml-with-ampersand-using-xmlserializer

Related

Removing Attribute value based on value from an XML using VB.Net

I have an XML as below
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope
xmlns="http://com/uhg/uht/uhtSoapMsg_V1"
xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Header>
<uhtHeader
xmlns="http://com/uhg/uht/uhtHeader_V1">
<consumer>COMET</consumer>
<auditId></auditId>
<sendTimestamp>2020-09-03T18:15:40.942-05:00</sendTimestamp>
<environment>P</environment>
<businessService version="24">getClaimHistory</businessService>
<status>success</status>
</uhtHeader>
</env:Header>
<env:Body>
<srvcRspn
xmlns="http://com/uhg/uht/getClaimHistory_V24">
<srvcErrList arrayType="srvcErrOccur[1]" type="Array">
<srvcErrOccur>
<orig>Foundation</orig>
<rtnCd>00</rtnCd>
<explCd>000</explCd>
<desc></desc>
</srvcErrOccur>
</SrvcErrList>
</srvcRspn>
</env:Body>
</env:Envelope>
I want to remove all the attribute values with "http" like below:
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope
xmlns=""
xmlns:env="">
<env:Header>
<uhtHeader
xmlns="">
<consumer>COMET</consumer>
<auditId></auditId>
<sendTimestamp>2020-09-03T18:15:40.942-05:00</sendTimestamp>
<environment>P</environment>
<businessService version="24">getClaimHistory</businessService>
<status>success</status>
</uhtHeader>
</env:Header>
<env:Body>
<srvcRspn
xmlns="">
<srvcErrList arrayType="srvcErrOccur[1]" type="Array">
<srvcErrOccur>
<orig>Foundation</orig>
<rtnCd>00</rtnCd>
<explCd>000</explCd>
<desc></desc>
</srvcErrOccur>
</SrvcErrList>
</srvcRspn>
</env:Body>
</env:Envelope>
I have tried several ways but none of them has worked for me. Can anyone suggest what is fastest way to do it in VB.NET/C#.
The actual response is very large (approx 100000 lines of XML minimum) and using for each will consume a good amount of time. Is there any parsing method or LINQ query method which can do it faster.
I got the way to do it using Regex as below:
Return Regex.Replace(xmlDoc, "((?<=<|<\/)|(?<= ))[A-Za-z0-9]+:| xmlns(:[A-Za-z0-9]+)?="".*?""", "")
It serves my purpose completely. Thanks Cleptus for your quick reference.

XML Name Cannot Begin with the "=" Character

I've read through the similar post of % character but it seems the other issues can be solved in the header line. Are there certain characters not allowed in XML or do I need to format the document another way (In my case the "=" character is giving me trouble when trying to read in the document in C#)?
Name cannot begin with the character ' ', also similar but still fixed by header.
XElement nodes = XElement.Load(filename);
The structure of the XML is below:
<?xml version="1.0" encoding="utf-8"?>
<offer>
<data id="Salary">
<ocrstring>which is equal to $60,000.00 if working 40 hours per week</ocrstring>
<rule>.*(([+-]?\$[0-9]{1,3}(?:,?[0-9]{3})*\.[0-9]{2}))</rule>
<output></output>
</data>
<data id="Hours">
<ocrstring></ocrstring>
<rule>"(?<=working).*?(?=hours)"</rule> <!-- Error Occurring Here -->
<output>bob</output>
</data>
<data id="Location">
<ocrstring></ocrstring>
<rule>Regex2</rule>
<output>LongWindingRoad222</output>
</data>
</offer>
How can I parse the XML Document without getting the Cannot Begin with Character "=" Error
You need to use CDATA sections for all the <rule> elements.
What does <![CDATA[]]> in XML mean?
XML
<?xml version="1.0" encoding="utf-8"?>
<offer>
<data id="Salary">
<ocrstring>which is equal to $60,000.00 if working 40 hours per week</ocrstring>
<rule><![CDATA[.*(([+-]?\$[0-9]{1,3}(?:,?[0-9]{3})*\.[0-9]{2}))]]></rule>
<output></output>
</data>
<data id="Hours">
<ocrstring></ocrstring>
<rule><![CDATA["(?<=working).*?(?=hours)"]]></rule>
<!-- Error Occurring Here -->
<output>bob</output>
</data>
<data id="Location">
<ocrstring></ocrstring>
<rule>Regex2</rule>
<output>LongWindingRoad222</output>
</data>
</offer>

How to extract a node from xml string C#

I have am xml string like mentioned below:
<?xml version="1.0" encoding="utf-8" ?>
<NodeA xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.air-watch.com/webapi/resources">
<AdditionalInfo>
<Links>
<Link xsi:type="link">
</Link>
</Links>
</AdditionalInfo>
<TotalResults>100</TotalResults>
<NodeB>
<NodeC>
<Id>1</Id>
<A>valueA</A>
<B>valueB</B>
</NodeC>
<NodeC>
<Id>2</Id>
<A>valueA</A>
<B>valueA</B>
</NodeC>
</NodeB>
</NodeA>
I want to extract NodeB and its child nodes (NodeC elements). How can I do it? Below solution does somewhat similar operation but it needs the xml string to be loaded in a XDocument first:
XDocument doc=XDocument.Parse(xmlstr);
String response=doc.Elements("question")
.Where(x=>x.Attribute("id")==id)
.Single()
.Element("response")
.Value;
Is there a way to do it without having to load it in a doc? Some operation on string object itself.
Why cant you use this
XDocument doc=XDocument.Parse(xmlstr);
String response=doc.Elements("question")
.Where(x=>x.Attribute("id")==id)
.Single()
.Element("response")
.Value; ?
you can use Regular Expressions then.

There is an error in XML document

I am getting the following exception when I am trying to deserialize the xml document. Xml document has a tag as url in which google search link may present. Google search link contains '=' which is not accepted in the xml document while deserializing it. I am getting the xml from server. So I cannot do anything with the string that is present in the url tag. I have to do something on my client part. How can I overcome this problem?
<?xml version="1.0" encoding="UTF-8"?>
<response>
<status>
<code>000</code>
<message>Successfully completed</message>
</status>
<reports>
<report>
<id>9973</id>
<url>http://www.google.com/search?q=guns&client=safari&safe=active</url>
</report>
</reports>
</response>
Exception :
An exception of type 'System.InvalidOperationException' occurred in System.Xml.XmlSerializer.dll but was not handled in user code
Innerexception:
{"'=' is an unexpected token. The expected token is ';'. Line 136, position 53."}
Your XML is invalid. The URL is breaking the XML standard. Specifically you should escape the &: &.
This is the valid XML:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<status>
<code>000</code>
<message>Successfully completed</message>
</status>
<reports>
<report>
<id>9973</id>
<url>http://www.google.com/search?q=guns&client=safari&safe=active</url>
</report>
</reports>
</response>
Check your XML export function to make sure it escapes the URL properly.

How to read nested XML using xDocument in Silver light?

Hi currently I have a nested XMl , having the following Structure :
<?xml version="1.0" encoding="utf-8" ?>
<Response>
<Result>
<item id="something" />
<price na="something" />
<?xml version="1.0" encoding="UTF-8" ?>
<DIDL-Lite xmlns="urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:upnp="urn:schemas-upnp-org:metadata-1-0/upnp/" xmlns:dlna="urn:schemas-dlna-org:metadata-1-0/">
</Result>
<NumberReturned>10</NumberReturned>
<TotalMatches>10</TotalMatches>
</Response>
Any help on how to read this using Xdocument or XMLReader will be really helpfull.
Thanks,
Subhendu
XDocument and XmlReader are both XML parsers that expect a properly formed XML as input. What you have shown is not a XML file. So the first task would be to extract the nested XML and as this is not valid XML you cannot rely on any parser to do this job. You'll need to resort to string manipulation and or regular expressions.
My suggestion would be to fix the procedure generating this invalid XML in the first place. Another suggestion is to never generate a XML file manually but use an appropriate tool for this (XmlWriter, XDocument, ...)

Categories