I'm trying to parse XML file like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<genrelist>
<genre name="00s"></genre>
<genre name="30s"></genre>
<genre name="40s"></genre>
<genre name="50s"></genre>
</genrelist>
I am using standard System.Xml deserializer, but I get an error: In document XML (0, 0) is error (my translation to english) even before start parsing that XML is invalid. How to parse this XML?
Deserialization code:
XmlSerializer serializer = new XmlSerializer(typeof(GenreList));
XmlReader reader = XmlReader.Create("http://yp.shoutcast.com/sbin/newxml.phtml", settings);
GenreList genrelist = (GenreList)serializer.Deserialize(reader);
I have had this error when XML files from other systems have some weird char at the beginning of the file.
Make sure the file starts with below:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
Sometimes it's worth opening the file in different editors to see if you can spot the rogue char.
Related
I need to open an xml file and replace
<?xml version="1.0" encoding="iso-8859-15"?>
with
<?xml version="1.0" encoding="UTF-8" href="diwa_klein.xsl"?>
My current code is a simple case of
File.WriteAllText(#copiedfile, File.ReadAllText(#copiedfile).Replace(#"<?xml version=""1.0"" encoding=""iso - 8859 - 15""?> ", #" <?xml version = ""1.0"" ?><? xml - stylesheet type = ""text / xsl"" href = ""diwa_klein.xsl"" ?> "));
This way seems to get ignored
I use .NET class XslCompiledTransform for xslt transformation and I have a problem with encodings. I have this word Förstelärare in my input xml. Here is cases:
input xml file has <?xml version="1.0" encoding="utf-8"?> - xslt file has <xsl:output encoding="utf-8" ... - OK
input xml file has <?xml version="1.0" encoding="utf-8"?> - xslt file has <xsl:output encoding="iso-8859-1" ... - OK
input xml file has <?xml version="1.0" encoding="iso-8859-1"?> - xslt file has <xsl:output encoding="iso-8859-1" ... - OK
input xml file has <?xml version="1.0" encoding="iso-8859-1"?> - xslt file has <xsl:output encoding="utf-8" ... - CORRUPTED - I see Förstelärare in output xml.
input.xml:
<?xml version="1.0" encoding="iso-8859-1"?>
<test>Förstelärare</test>
trans.xslt:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="utf-8" standalone="yes" indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates select="/test" />
</xsl:template>
<xsl:template match="test">
<test><xsl:value-of select="text()"/></test>
</xsl:template>
</xsl:stylesheet>
C# code:
var xslCompiledTransform = new XslCompiledTransform();
using (var xmlReader = XmlReader.Create(#"C:\trans.xslt", new XmlReaderSettings { DtdProcessing = DtdProcessing.Ignore, XmlResolver = null }))
{
xslCompiledTransform.Load(xmlReader);
}
using (var xmlReader = XmlReader.Create(#"C:\input.xml", new XmlReaderSettings { DtdProcessing = DtdProcessing.Ignore, XmlResolver = null }))
using (var xmlWriter = XmlWriter.Create(#"C:\output.xml", xslCompiledTransform.OutputSettings))
{
xslCompiledTransform.Transform(xmlReader, xmlWriter);
}
output.xml:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<test>Förstelärare</test>
Why does it happen? It looks like I need to use iso-8859-1 in my xslt file to prevent corruption cause it works in both cases iso-8859-1 and utf-8.
The output you're seeing is the result of interpreting a string encoded with UTF-8 as if it were iso-8859-1.
There are two possibilities:
Your source file is actually encoded as UTF-8: just because the XML declaration says iso-8859-1, that doesn't necessarily mean that's how the text has been saved. (EDIT: Based on comments, I believe this is what's happening in your case.)
Alternatively, when you're writing it out as UTF-8 it's working just fine, but whatever you're using to inspect the output is ignoring that and assuming it's iso-8859-1.
Here's the character in it's various encodings:
http://www.fileformat.info/info/unicode/char/00f6/index.htm
I would suggest looking at your source document in a hex editor, and immediately following the 'F' (70 or 0x46 in any encoding), you should see 0xF6 if it's in iso-8859-1 as per the XML declaration- in which case you're probably reading the output in the wrong encoding. If it's 0xC3 0xB6, that's UTF-8, and the encoding in the XML declaration of your source is wrong.
i want to know how to remove :
<?xml version="1.0" encoding="UTF-8"?>
from a string data.
I have tried this but it doesn't work
string result = data.Replace("<?xml version=\"1.0\" encoding=\"UTF-8\" ?>", "");
(I am not working with xml , it's just a response to manipulate it without header )
Let's look at your two strings. Removing the escapes they are:
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8" ?>
In other words you've managed to add an extra space. Remove that and your code will succeed.
More broadly, one wonders why you are attempting to do this. Simple text processing of XML files is liable to lead to pain and suffering. Perhaps you should consider using a parser.
I have a XML template file like so
<?xml version="1.0" encoding="us-ascii"?>
<AutomatedDispenseResponse>
<header shipmentNumber=""></header>
<items></items>
</AutomatedDispenseResponse>
When I use XDocument.Load, for some reason the
<?xml version="1.0" encoding="us-ascii"?>
is dropped.
How do I load the file into a XDocument and not losing the declaration at the top?
I suspect it's not really dropping the declaration on load - it's when you're writing the document out that you're missing it. Here's a sample app which works for me:
using System;
using System.Xml.Linq;
class Test
{
static void Main()
{
XDocument doc = XDocument.Load("test.xml");
Console.WriteLine(doc.Declaration);
}
}
And test.xml:
<?xml version="1.0" encoding="us-ascii" ?>
<Foo>
<Bar />
</Foo>
Output:
<?xml version="1.0" encoding="us-ascii"?>
The declaration isn't shown by XDocument.ToString(), and may be replaced when you use XDocument.Save because you may be using something like a TextWriter which already knows which encoding it's using. If you save to a stream or just to a filename, it's preserved in my experience.
It is loaded. You can see it and access parts of it using:
XDocument.Parse(myDocument).Declaration
Hi currently I have a nested XMl , having the following Structure :
<?xml version="1.0" encoding="utf-8" ?>
<Response>
<Result>
<item id="something" />
<price na="something" />
<?xml version="1.0" encoding="UTF-8" ?>
<DIDL-Lite xmlns="urn:schemas-upnp-org:metadata-1-0/DIDL-Lite/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:upnp="urn:schemas-upnp-org:metadata-1-0/upnp/" xmlns:dlna="urn:schemas-dlna-org:metadata-1-0/">
</Result>
<NumberReturned>10</NumberReturned>
<TotalMatches>10</TotalMatches>
</Response>
Any help on how to read this using Xdocument or XMLReader will be really helpfull.
Thanks,
Subhendu
XDocument and XmlReader are both XML parsers that expect a properly formed XML as input. What you have shown is not a XML file. So the first task would be to extract the nested XML and as this is not valid XML you cannot rely on any parser to do this job. You'll need to resort to string manipulation and or regular expressions.
My suggestion would be to fix the procedure generating this invalid XML in the first place. Another suggestion is to never generate a XML file manually but use an appropriate tool for this (XmlWriter, XDocument, ...)