Xml file with encoding mentioned as below:
<?xml version="1.0" encoding="iso-8859-1"?>
Contains some of the Japanese characters as mentioned below:
<Name>
<![CDATA[熊本大学Slave_1002 大 [EL2002]]]>
</Name>
While reading the same file corrupts Japanese characters and it becomes name as
<Name><![CDATA[????Slave_1002 ? [EL2002]]]></Name>
Below is the code using to read the file.
using (StreamReader streamReader = new
StreamReader(filePath,System.Text.Encoding.GetEncoding("iso8859-1")))
{
XDocument xdoc = XDocument.Load(streamReader);
}
Tried with encoding UTF-8 and unicode as well.
I quickly check the specs and as far as I understand it CDATA section should have the same encoding as the rest of the document, but there are some known issues. Since you have already tried utf-8... is there any other encoding specified in doc preamble <?xml version="1.0" encoding="like here" ?>? It is strange that you can see those characters in text editor.
This encoding iso-8859-1 is Latin, there's no way it could handle Japaneese. So I created a test xml file like this
<?xml version="1.0" encoding="utf-8"?>
<Name>
<![CDATA[熊本大学Slave_1002 大 [EL2002]]]>
</Name>
And VS told me to save it as UTF-8 and does not allow to select that iso as document encoding at all. I also write a test program
var xml = XDocument.Load(#"..\..\test.xml");
var val = ((XCData)xml.Root.FirstNode).Value;
Console.WriteLine(val);
File.WriteAllText(#"..\..\cdata.txt", val);
Console.ReadLine();
which gives me on console
but in text file..
To sum up:
I think the xml is not in declared encoding (at least partially)
System.Xml.Linq works fine so it's not a quirk or something
You might read that value correctly but you have troubles with viewing it.
I've changed declared document encoding as iso and use new StreamReader(#"..\..\test.xml", Encoding.UTF8); as XDocument source and the result was correct.
I have Special XML file with utf-16 encoding type. this file used to store data and I need to Edit it Using C# windows forms Application
<?xml version="1.0" encoding="utf-16"?>
<cProgram xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" ID="b0eb0c7e-f4de-4bc7-9e62-7a086a8c2fn8" Version="16.01" xmlns="cProgram">
<Serie>N </Serie>
<No>123456</No>
<type>101</type>
<Dataset4>larg data here 2 million char</Dataset4>
</cProgram>123456FF896631N 4873821012013-06-14
the problem is: it is not ordinary XML file
Because at the very End of the file I have a string line too, and that would give this error
Data at the root level is invalid. Line x, position x
when I try to load it as xml file
I tried to temporary replace the last line and get it back after I change the inner text, and it works But I lost the declaration Line and I didn't find a way to rewrite it when I have that text at the end of the file !_
so I need to change the InnerText of (Serie) and (No) nodes
but I don't Want to lose the declaration Line or the string text at the end of the file
try this piece of code:
string line = "";
string[] stringsperate = new string[] { "</cProgram>" };
using (StreamReader sr = new StreamReader("C://blah.xml"))
{
line = sr.ReadToEnd();
Console.WriteLine(line);
}
string text = line.Split(stringsperate, StringSplitOptions.None)[0];
text += "</cProgram>";
XmlDocument xd = new XmlDocument();
xd.LoadXml(text);
Console.Read();
Hope this helps
XDocument.Save() should persist the XML declaration line if the declaration exists initially. I also checked with your XML and the declaration line saved as expected :
var xml = #"<?xml version=""1.0"" encoding=""utf-16""?>
<cProgram xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" ID=""b0eb0c7e-f4de-4bc7-9e62-7a086a8c2fn8"" Version=""16.01"" xmlns=""cProgram"">
<Serie>N </Serie>
<No>123456</No>
<type>101</type>
<Dataset4>larg data here 2 million char</Dataset4>
</cProgram>";
var doc = XDocument.Parse(xml);
doc.Save("test.xml");
So you can implement your idea to temporarily replacing the last line and get it back after changing the inner text.
Fyi, XDocument's .ToString() method doesn't write XML declaration line, but .Save() method does. Question related to this : How to print <?xml version="1.0"?> using XDocument
allow me to answer my question
when I used doc.Load(filepath); it always give Error cause of the disturbing last Line
and C# use UTF-8 as defaults to work with xml files.But in this question it is UTF-16
So I found a very short way to do this & replace innertext with string as I want
string text = File.ReadAllText(filepath);
text = text.Replace("<Serie>N", "<Serie>"+textBox1.Text);
text = text.Replace("<Nom>487382","<Nom>"+textBox2.Text);
//saving file with UTF-16
File.WriteAllText("new.xml", text , Encoding.Unicode);
Question related to this [blog]: How to save this string into XML file? "it is much more answer related than being Question related"
I got this error while Parse an string to XDocument after edit and save it. But anyone can help me locate error position - The Line 1, position 10475. How can i get that position ???
System.Xml.XmlException: Unexpected XML declaration. The XML
declaration must be the first node in the document, and no white space
characters are allowed to appear before it. Line 1, position 10475.
if (storage.FileExists("APPSDATA.xml"))
{
var reader = new StreamReader(new IsolatedStorageFileStream("APPSDATA.xml", FileMode.Open, storage));
string xml = reader.ReadToEnd();
var xdoc = XDocument.Parse(xml);//error here
reader.Close();
The XML is big, this is jus a part of it
<?xml version="1.0" encoding="UTF-8"?>
<Ungdungs>
<Ungdung>
<Name>HERE City Lens</Name>
<Id>b0a0ac22-cf9e-45ba-8120-815450e2fd71</Id>
<Path>/Icon/herecitylens.png</Path>
<Version>1.0.0.0</Version>
<Category>HERE</Category>
<Date>Uknown</Date>
</Ungdung>
<Ungdung>
<Name>HERE Transit</Name>
<Id>adfdad16-b54a-4ec3-b11e-66bd691be4e6</Id>
<Path>/Icon/heretransit.png</Path>
<Version>1.0.0.0</Version>
<Category>HERE</Category>
<Date>Uknown</Date>
</Ungdung>
Make sure your <?xml tag is the first thing in the document (and that it doesn't have anything before that, this includes whitespace).
You can have <?xml only once per document, so if you have a large chunk of XML and you have this tag repeated somewhere down the lines your document won't be valid.
In my case this was related to the byte order mark - BOM. I opened the file in Notepad++ selected encoding "encode in UTF-8 without BOM" and was then able to see the annoying charater and delete it.
This error might occur if you previously saved the xml file with the boolean 'append = true'.
Make if 'false', it should work.
I'm downloading and parsing a lot of XML files from Internet. They all have different encodings that are described on the first line.
<?xml version="1.0" encoding="windows-1251"?>
<?xml version="1.0" encoding="UTF-8"?>
and so on...
I need to set correct WebClient.Encoding parameter in order to receive the text in correct encoding. But I can't do that without pre-downloading the file and reading the first line.
Is it possible to do?
Thank you
You don't need to set anything - you don't need to handle the encoding at all. Just get the binary data and get the XML parsers to handle it. Or if you're going to store the files on disk, just dump the binary data straight onto disk. You don't need to worry about the encoding at all.
Simply use this now and it should handle everything own his own:
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
XDocument.Load(myHttpWebResponse.GetResponseStream());
http://msdn.microsoft.com/en-us/library/system.xml.linq.xdocument.aspx
I have a large xml file (approx. 10 MB) in following simple structure:
<Errors>
<Error>.......</Error>
<Error>.......</Error>
<Error>.......</Error>
<Error>.......</Error>
<Error>.......</Error>
</Errors>
My need is to write add a new node <Error> at the end before the </Errors> tag. Whats is the fastest way to achieve this in .net?
You need to use the XML inclusion technique.
Your error.xml (doesn't change, just a stub. Used by XML parsers to read):
<?xml version="1.0"?>
<!DOCTYPE logfile [
<!ENTITY logrows
SYSTEM "errorrows.txt">
]>
<Errors>
&logrows;
</Errors>
Your errorrows.txt file (changes, the xml parser doesn't understand it):
<Error>....</Error>
<Error>....</Error>
<Error>....</Error>
Then, to add an entry to errorrows.txt:
using (StreamWriter sw = File.AppendText("logerrors.txt"))
{
XmlTextWriter xtw = new XmlTextWriter(sw);
xtw.WriteStartElement("Error");
// ... write error messge here
xtw.Close();
}
Or you can even use .NET 3.5 XElement, and append the text to the StreamWriter:
using (StreamWriter sw = File.AppendText("logerrors.txt"))
{
XElement element = new XElement("Error");
// ... write error messge here
sw.WriteLine(element.ToString());
}
See also Microsoft's article Efficient Techniques for Modifying Large XML Files
First, I would disqualify System.Xml.XmlDocument because it is a DOM which requires parsing and building the entire tree in memory before it can be appended to. This means your 10 MB of text will be more than 10 MB in memory. This means it is "memory intensive" and "time consuming".
Second, I would disqualify System.Xml.XmlReader because it requires parsing the entire file first before you can get to the point of when you can append to it. You would have to copy the XmlReader into an XmlWriter since you can't modify it. This requires duplicating your XML in memory first before you can append to it.
The faster solution to XmlDocument and XmlReader would be string manipulation (which has its own memory issues):
string xml = #"<Errors><error />...<error /></Errors>";
int idx = xml.LastIndexOf("</Errors>");
xml = xml.Substring(0, idx) + "<error>new error</error></Errors>";
Chop off the end tag, add in the new error, and add the end tag back.
I suppose you could go crazy with this and truncate your file by 9 characters and append to it. Wouldn't have to read in the file and would let the OS optimize page loading (only would have to load in the last block or something).
System.IO.FileStream fs = System.IO.File.Open("log.xml", System.IO.FileMode.Open, System.IO.FileAccess.ReadWrite);
fs.Seek(-("</Errors>".Length), System.IO.SeekOrigin.End);
fs.Write("<error>new error</error></Errors>");
fs.Close();
That will hit a problem if your file is empty or contains only "<Errors></Errors>", both of which can easily be handled by checking the length.
The fastest way would probably be a direct file access.
using (StreamWriter file = File.AppendText("my.log"))
{
file.BaseStream.Seek(-"</Errors>".Length, SeekOrigin.End);
file.Write(" <Error>New error message.</Error></Errors>");
}
But you lose all the nice XML features and may easily corrupt the file.
I would use XmlDocument or XDocument to Load your file and then manipulate it accordingly.
I would then look at the possibility of caching this XmlDocument in memory so that you can access the file quickly.
What do you need the speed for? Do you have a performance bottleneck already or are you expecting one?
How is your XML-File represented in code? Do you use the System.XML-classes? In this case you could use XMLDocument.AppendChild.
Try this out:
var doc = new XmlDocument();
doc.LoadXml("<Errors><error>This is my first error</error></Errors>");
XmlNode root = doc.DocumentElement;
//Create a new node.
XmlElement elem = doc.CreateElement("error");
elem.InnerText = "This is my error";
//Add the node to the document.
if (root != null) root.AppendChild(elem);
doc.Save(Console.Out);
Console.ReadLine();
Here's how to do it in C, .NET should be similar.
The game is to simple jump to the end of the file, skip back over the tag, append the new error line, and write a new tag.
#include <stdio.h>
#include <string.h>
#include <errno.h>
int main(int argc, char** argv) {
FILE *f;
// Open the file
f = fopen("log.xml", "r+");
// Small buffer to determine length of \n (1 on Unix, 2 on PC)
// You could always simply hard code this if you don't plan on
// porting to Unix.
char nlbuf[10];
sprintf(nlbuf, "\n");
// How long is our end tag?
long offset = strlen("</Errors>");
// Add in an \n char.
offset += strlen(nlbuf);
// Seek to the END OF FILE, and then GO BACK the end tag and newline
// so we use a NEGATIVE offset.
fseek(f, offset * -1, SEEK_END);
// Print out your new error line
fprintf(f, "<Error>New error line</Error>\n");
// Print out new ending tag.
fprintf(f, "</Errors>\n");
// Close and you're done
fclose(f);
}
The quickest method is likely to be reading in the file using an XmlReader, and simply replicating each read node to a new stream using XmlWriter When you get to the point at which you encounter the closing </Errors> tag, then you just need to output your additional <Error> element before coninuing the 'read and duplicate' cycle. This way is inevitably going to be harder than than reading the entire document into the DOM (XmlDocument class), but for large XML files, much quicker. Admittedly, using StreamReader/StreamWriter would be somewhat faster still, but pretty horrible to work with in code.
Using string-based techniques (like seeking to the end of the file and then moving backwards the length of the closing tag) is vulnerable to unexpected but perfectly legal variations in document structure.
The document could end with any amount of whitespace, to pick the likeliest problem you'll encounter. It could also end with any number of comments or processing instructions. And what happens if the top-level element isn't named Error?
And here's a situation that using string manipulation fails utterly to detect:
<Error xmlns="not_your_namespace">
...
</Error>
If you use an XmlReader to process the XML, while it may not be as fast as seeking to EOF, it will also allow you to handle all of these possible exception conditions.
I attempted to use code other answers had suggested but ran into an issue where sometimes calling .length on my strings was not the same as the number of bytes for the string so I was inconsistently losing characters. I modified it to get the byte count instead.
var endTag = "</Errors>";
var nodeText = GetNodeText();
using (FileStream file = File.Open("my.log", FileMode.Open, FileAccess.ReadWrite))
{
file.BaseStream.Seek(-(Encoding.UTF8.GetByteCount(endTag)), SeekOrigin.End);
fileStream.Write(Encoding.UTF8.GetBytes(nodeText), 0, Encoding.UTF8.GetByteCount(nodeText));
fileStream.Write(Encoding.UTF8.GetBytes(endTag), 0, Encoding.UTF8.GetByteCount(endTag));
}