How to remove all namespaces from broken XML with C#?

How to remove all namespaces from broken XML with C#? - c#

Here is how to remove all namespace from xml. But it is not working for me. Because sometimes I am getting broken xml feed. eg:
<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress.com" -->
<rss version="2.0"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<channel>
<title>sabri ?lker - WordPress.com Search</title>
<link>http://tr.search.wordpress.com/?q=sabri+%C3%BClker&page=2&t=comment&s=date</link>
<description>sabri ?lker - WordPress.com Search</description>
<pubDate>Fri, 04 Jan 2013 08:58:41 +0000</pubDate>
<language>tr</language>
<image><url>http://s.wordpress.com/i/buttonw-com.png</url><width>224</width><height>58</height><title>WordPress.com</title><link>http://wordpress.com/</link></image>
<generator>http://search.wordpress.com/</generator>
<atom:link rel="self" type="application/rss+xml" href="http://tr.search.wordpress.com/?q=sabri+%C3%BClker&page=2&t=comment&s=date&f=feed" />
<atom:link rel="search" type="application/opensearchdescription+xml" href="http://en.search.wordpress.com/opensearch.xml" title="WordPress.com" />
<opensearch:totalResults>10</opensearch:totalResults><opensearch:startIndex>11</opensearch:startIndex><opensearch:itemsPerPage>10</opensearch:itemsPerPage><opensearch:Query role="request" searchTerms="sabri ?lker startPage=\"2" /></channel>
</rss>
my exceptiom is "Name cannot begin with the '2' character, hexadecimal value 0x32. Line 17, position 227." to the result. So what should I do to solved this problem.

I'd say the reason is the ill-formed searchTerms attribute:
searchTerms="sabri ?lker startPage=\"2"
It's quoted the wrong way it should use " instead of \". You could simply replace all \" with "
string input = ..; // your xml
string processedInput = input.Replace("\\\"", """);
// then feed this into your xml parser.
This should solve your issue, but it's of course not a general way of sanitizing wrong xml input. You may want to have a look at http://tidyfornet.sourceforge.net/ it can sanitize HTML, XHTML and XML.

Related

Read xml file contains multi language characters corrupting characters of Japanese language

Xml file with encoding mentioned as below:
<?xml version="1.0" encoding="iso-8859-1"?>
Contains some of the Japanese characters as mentioned below:
<Name>
<![CDATA[熊本大学Slave_1002 大 [EL2002]]]>
</Name>
While reading the same file corrupts Japanese characters and it becomes name as
<Name><![CDATA[????Slave_1002 ? [EL2002]]]></Name>
Below is the code using to read the file.
using (StreamReader streamReader = new
StreamReader(filePath,System.Text.Encoding.GetEncoding("iso8859-1")))
{
XDocument xdoc = XDocument.Load(streamReader);
}
Tried with encoding UTF-8 and unicode as well.

I quickly check the specs and as far as I understand it CDATA section should have the same encoding as the rest of the document, but there are some known issues. Since you have already tried utf-8... is there any other encoding specified in doc preamble <?xml version="1.0" encoding="like here" ?>? It is strange that you can see those characters in text editor.
This encoding iso-8859-1 is Latin, there's no way it could handle Japaneese. So I created a test xml file like this
<?xml version="1.0" encoding="utf-8"?>
<Name>
<![CDATA[熊本大学Slave_1002 大 [EL2002]]]>
</Name>
And VS told me to save it as UTF-8 and does not allow to select that iso as document encoding at all. I also write a test program
var xml = XDocument.Load(#"..\..\test.xml");
var val = ((XCData)xml.Root.FirstNode).Value;
Console.WriteLine(val);
File.WriteAllText(#"..\..\cdata.txt", val);
Console.ReadLine();
which gives me on console
but in text file..
To sum up:
I think the xml is not in declared encoding (at least partially)
System.Xml.Linq works fine so it's not a quirk or something
You might read that value correctly but you have troubles with viewing it.
I've changed declared document encoding as iso and use new StreamReader(#"..\..\test.xml", Encoding.UTF8); as XDocument source and the result was correct.

how to extract attribute from tag xml with c#?

<channel>
<title>test + test</title>
<link>http://testprog.test.net/api/test</link>
<description>test.com</description>
<category>test + test</category>
<item xml:base="http://test.com/test.html?id=25>
<guid isPermaLink="false">25</guid>
<link>http://test.com/link.html</link>
<title>title test</title>
<description>Description test description test</description>
<a10:updated>2015-05-26T10:23:53Z</a10:updated>
<enclosure type="" url="http://test.com/test/test.jpg" width="200" height="200"/>
</item>
</channel>
I extracted this tag (title test) like this:
title = ds.Tables["item"].Rows[0]["title"] as string;
how to extract url attribute from <encosure> tag with c#?
thx

First option
You can create classes to map and deserialize the XML into object and easily access as properties.
Second option
If you are only interested in few values and you don't want to create mapping classes , you can use XPath, there are many articles and questions anwered that you can easily find.
To extract url attribute from tag you can use the path:
"/channel/item/enclosure/param[#name='url']/#value"

There are many, many articles that will help you read XML, but the simple answer is to load your XML into an XML document, and simply call
doc.GetElementsByTagName("enclosure")
This will return an XmlNodeList with all 'enclosure' tags found in your document. I would really recommend doing some reading about using XML to make sure your application is functional and robust.

You can use LinqToXML and this will be better useful for you...
please refer the code
string xml = #"<channel>
<title>test + test</title>
<link>http://testprog.test.net/api/test</link>
<description>test.com</description>
<category>test + test</category>
<item xml:base=""http://test.com/test.html?id=25"">
<guid isPermaLink=""false"">25</guid>
<link>http://test.com/link.html</link>
<title>title test</title>
<description>Description test description test</description>
<a10>2015-05-26T10:23:53Z</a10>
<enclosure type="""" url=""http://anupshah.com/test/test.jpg"" width=""200"" height=""200""/>
</item>
</channel>";
var str = XElement.Parse(xml);
var result = (from myConfig in str.Elements("item")
select myConfig.Elements("enclosure").Attributes("url").SingleOrDefault())
.First();
Console.WriteLine(result.ToString());
I hope it will help you...

Malformed xml not parsing in XDocument.Parse

Please help, I have an xml document that I want to parse with XDocument. But the xml string that I have has multiple xmlns attributes that is empty. xmlns="" and the moment I remove it it parse, But I receive this from a webservice. I tried replacing it but every which way I try it only replaces one " and I am left with an invalid xmlstring <test xmlns="> I tried regex, I tried the Replace function I tried every know way, and I am now stuck,
Any Suggestions?
string xmlString = #"
<UserFile xmlns=""http://temuri.org"">
<user>
<UserName>Daniel</UserName>
<UserSurname>Vrey</UserSurname>
<Toys xmlns="">
<TToy>Toyota</TToy>
<TToy>Ford</TToy>
</Toys>
</user>
</UserFile>";
XDocument d = XDocument.Parse(xmlString);

Error when reading XML

I am currently writing an XML writer/reader. I have it writing to the xml file, now I am attempting to read from it. However, when I do so the following error is thrown and I am not sure why:
'>' is an unexpected token. The expected token is '='. Line 6, position 16. XML reader c#
Please could someone shed some light on this for me?
The XML file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<assignments>
<assignment>
<ModuleTitle>Internet Programming</ModuleTitle>
<AssignmentTitle>Assignment 01</AssignmentTitle>
<Date Given>11/02/2015</Date Given>
<Date Due>20/02/2015</Date Due>
</assignment>
</assignments>
UPDATE:
The problem was the fact that in some of my tag names I had spaces, which was causing the error.

You have invalid spaces, the following will work:
XElement config = XElement.Parse (
#"<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<assignments>
<assignment>
<ModuleTitle>Internet Programming</ModuleTitle>
<AssignmentTitle>Assignment 01</AssignmentTitle>
<DateGiven>11/02/2015</DateGiven>
<DateDue>20/02/2015</DateDue>
</assignment>
</assignments>");
Please note DateGiven and DateDuewithout spaces.
The spaces are the reason for the error as shown below:

<Date Given> is not a valid XML syntax. Given is supposed to be an attribute with a value, so it should look something like this: <Date Given="true">
Edit to be useful in the future: as #James mentioned, it is just a space in the tag name, which is also invalid in XML.

XML tag meaning

I have a part of xml file
<Text><?xml version="1.0" encoding="utf-16"?>
<ObjectFilter xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<FilterConditions>
<FilterCondition>
<PropertyFilters>
<PropertyFilter>
<PropertyName>Message</PropertyName>
<FilterValue xsi:type="xsd:string">PPM exceeds tolerance</FilterValue>
<FilterType>strExpr</FilterType>
<Operator>eq</Operator>
<CaseSensitive>true</CaseSensitive>
<Recursive>false</Recursive>
</PropertyFilter>
</PropertyFilters>
<Enabled>true</Enabled>
<ObjectTypeName>Spo.DataModel.UnixLogMessage</ObjectTypeName>
<ObjectClassGR>
<Guid>00000000-0000-0000-0000-000000000000</Guid>
</ObjectClassGR>
Here what is node Recursive meant,,it actually like this <Recursive>false</Recursive>
but how come it like &lt ;Recursive>false&lt ;/Recursive >
Can any one help me about this

How are you getting this XML file? From a webpage?
It seems that the way you are getting the text file is translating it as an HTML document and thus turning your '<' into &lt and your '>' into &gt
You need to ensure that the page is not interpreted as HTML. You could just copy-paste everything into Notepad first for a simple solution.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to remove all namespaces from broken XML with C#? - c#

Related

Read xml file contains multi language characters corrupting characters of Japanese language

how to extract attribute from tag xml with c#?

Malformed xml not parsing in XDocument.Parse

Error when reading XML

XML tag meaning

Categories

Resources