Editing xml files and preserving whitespaces and tabs - c#

I have an XML document that would contain empty nodes that looked like the following:
<metadata territory="USA"></metadata>
After simply opening, then saving using XmlDocument, this line looks like:
<metadata territory="USA">
</metadata>
When I set PreserveWhitespace to true, it converted the entire XML to 1 line, so this won't work.
These XML files need to keep the current formatting as much as possible. I know, technically, it doesn't matter which way they are written, they will be read the same way but I still need to keep the same formatting. I can't figure out a way to keep the nodes with no values to 1 line. Is there a way to do this?
The ONLY method that keeps the document in its original formatting is if the XML file contained 'xml:space="preserve"' in the header, but I am to leave the header as is.
The only thing I want to change is the addition of values. As I said, simply loading and saving a document adds this, so if you want to test, just try...
XmlDocument doc = new XmlDocument();
doc.Load(#"C:\Temp\test.xml");
doc.Save(#"C:\Temp\test_02.xml");

Just did the test and this works using both XDocument and XmlDocument by setting the PreserveWhitespace property.
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.PreserveWhitespace = true;
xmlDoc.Load("test.xml");
xmlDoc.Save("testOut.xml");
..
XDocument xdoc = XDocument.Load("test.xml", LoadOptions.PreserveWhitespace);
xdoc.Save(#"testOut.xml");
Input:
<foo>
<metadata territory="USA"></metadata>
<bar></bar>
<baz>
</baz>
</foo>
Output:
<foo>
<metadata territory="USA"></metadata>
<bar></bar>
<baz>
</baz>
</foo>

I'm with Richard Schneider: I don't believe it's possible. One possible solution is to take the output XML file and run it through an XML formatting program that normalizes the format of the XML file (you can probably write one with the unmanaged XML dom if one can't be found).
Since the file is always normalized, it won't change that much hopefully.

If you're using xmlDocument, I may recommend you to use XDocument instead (Framework 3.0+).
PreserveWhitespace will add a
<whatever> <...>
**</whatever>**
to each line while None will just close it like <... />.
I looked up for 5 minutes how to preserve those white space, but couldn't find it. There's something ommitting char(13) in the de/reserialization.
XDocument doc;
using (FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read))
{
//Alternative with .None
doc = XDocument.Load(fs, LoadOptions.PreserveWhitespace);
}
and importantly..
xmlDoc.Save("lala.xml", SaveOptions.None);

I don't think this is possible. When you load the XML document you lose formatting information; so there is no way that Save can give the same results.

Why not save the file as a different format, then rename it back to XML after it's been saved. I would be surprised if it still gets formatted incorrectly. Not pretty but pretty easy.

Related

How to find nodes in XML File using Line number C#?

I have below XML files. While I'm using it in Process its gives me error like
[2016-06-22 19:29:53 IST] ERROR: "Line 15 column 20: character content of element "electronics" invalid; must be equal to "foods", "goods" or "mobiles" at XPath
/products/product[1]/item_type"
This XPath /products/product[1] is not correct. it's returns by their system.
So, I have to search it and replace with original value through Line No only.
Order.xml
<products>
<product>
<country>AE</country>
<sale>false</sale>
<item_type>foods</item_type>
</product>
<product>
<country>IN</country>
<sale>false</sale>
<item_type>goods</item_type>
</product>
<product>
<country>US</country>
<sale>false</sale>
<item_type>electronics</item_type>
</product>
<product>
<country>AM</country>
<sale>false</sale>
<item_type>mobiles</item_type>
</product>
</products>
Please let me know to do search using Line No(15) in XML document
You should not rely on line/colmns numbers when dealing with xml. Xml does not care much for whitespace and formatting is a merely convenience for readabilty; Although it is correctly reported in your example, I would not trust it will alway be that way in future. Interstingly the xpath is also incorrect but lets not dwell on that for the moment.
You need to take back control of the problem... You should create your own validating reader and capture the validation error while the xml is being parsed. How you do this will depend on which framework your using. Either XmlValidatingReader or, XmlReader with the appropriate XmlReaderSettings.
You can set up a callback/eventhandler to capture the error as the xml file is being read, so you can be certain your in the correct place and have all the information you need to handle the error to hand. Using an XmlReader will also allow you to continue to process the entire file and not just stop at the first error.
The code is too big for SO and we'd require a lot more information from you to do it but a google search will find lots of examples including this one from microsoft: https://msdn.microsoft.com/en-us/library/w5aahf2a(v=vs.100).aspx
Construct a new XmlReaderSettings instance.
Add an XML schema to the Schemas property of the XmlReaderSettings instance.
Specify Schema as the ValidationType.
Specify ValidationFlags and a ValidationEventHandler to handle schema validation errors and warnings encountered during validation.
Pass the XmlReaderSettings object to the Create method of the XmlReader class along with the XML document, creating a schema-validating XmlReader.
Call Read() to run through the xml from start to end.
Would this work for you?
string[] lines = File.ReadAllLines(#"C:\order.xml", Encoding.UTF8);
var line15 = lines[14];
XmlDocument doc = new XmlDocument();
doc.Load(#"C:\order.xml");
XmlElement el =(XmlElement)doc.SelectSingleNode("/products/product[1]/item_type");
if(el != null) { el.ParentNode.RemoveChild(el); }
you can remove node by using above code...using proper Xpath you can remove all the item_type which is not equals to foods,good or mobiles.
Try this,
var xml = XDocument.Load(#"C:\order.xml", LoadOptions.SetLineInfo);
var lineNumbers = xml.Descendants()
.Where(x => !x.Descendants().Any() && //exact node contains the value
x.Value.Contains("foods"))
.Cast<IXmlLineInfo>()
.Select(x => x.LineNumber);
int getLineNumber = lineNumbers.First();

Reading Xml files with umlaut chars

I have asked this question yesterday and got a reply.
Writing encoded values for umlauts
In the code the parse method works if it's a string like so:
XDocument xDoc = XDocument.Parse("<description>Top Shelf-ÖÄÜookcase</description>");
To pass the input xml file as string, I have to read it first. The read method will fail if there are umlauts in the input xml.
How do I get past that?
Tried both Load and Parse methods of XDocument.
Load:
Invalid character in the given encoding. Line 3, position 35.
Parse:
Data at the root level is invalid. Line 1, position 1.
Here is a sample xml after using CDATA:
<?xml version="1.0" encoding="utf-8"?>
<kal>
<description><![CDATA[Top Shelf-ÖÄÜookcase]]> </description>
</kal>
Change encoding to "iso-8859-1"
Have you tried wrapping the description data with a CDATA?
<description><![CDATA[Top Shelf-ÖÄÜookcase]]> </description>
Special characters don't particularly parse well in XML unless you wrap them with CDATA.
As Besi stated, you have to use the correct encoding of the xml-file in order to achieve correct handling of the umlauts.
Even so you said that the creation of the incoming xml-file is not in your hand, you can still affect the encoding to use for parsing the xml by using a dedicated StreamReader:
// create your XDocument
XDocument Doc;
// setup a StreamReader for your file, specifying the encoding you need
using (StreamReader Reader = new StreamReader(#"C:\your-file.xml", System.Text.Encoding.GetEncoding("ISO-8859-1")))
{
// PARSE the STRING that is RETURNED from the StreamReader.ReadToEnd()-method
Doc = XDocument.Parse(Reader.ReadToEnd());
}

Reading oddly-formatted XML file C#

I need some help with reading an oddly-formatted XML file. Because of the way the nodes and attributes are structured, I keep running into XMLException errors (at least, that's what the output window is telling me; my breakpoints refuse to fire so that I can check it). Anyway, here's the XML. Anyone experienced anything like this before?
<ApplicationMonitoring>
<MonitoredApps>
<Application>
<function1 listenPort="5000"/>
</Application>
<Application>
<function2 listenPort="6000"/>
</Application>
</MonitoredApps>
<MIBs>
<site1 location="test.mib"/>
</MIBs>
<Community value="public"/>
<proxyAgent listenPort="161" timeOut="2"/>
</ApplicationMonitoring>
Cheers
EDIT: Current version of the parsing code (file path shortened - Im not actually using this one):
XmlDocument xml = new XmlDocument();
xml.LoadXml(#"..\..\..\ApplicationMonitoring.xml");
string port = xml.DocumentElement["proxyAgent"].InnerText;
Your problem in loading the XML is that xml.LoadXml expects you to pass the xml document as a string, not a file reference.
Try instead using:
xml.Load(#"..\..\..\ApplicationMonitoring.xml");
Essentially in your original code you are telling it that your xml document is
..\..\..\ApplicationMonitoring.xml
And I'm sure you can now see why there is a parse exception. :) I've tested this with your xml document and the modified load and it works fine (except for the issue that Only Bolivian Here pointed out with the fact that your inner Text is not going to return anything.
For completeness you probably want:
XmlDocument xml = new XmlDocument();
xml.Load(#"..\..\..\ApplicationMonitoring.xml");
string port = xml.DocumentElement["proxyAgent"].Attributes["listenPort"].Value;
//And to get stuff more specifically in the tree something like this
string function1 = xml.SelectSingleNode("//function1").Attributes["listenPort"].Value;
Note the use of the Value property on the attribute and not the ToString method which won't do what you are expecting.
Exactly how you extract the data from the xml is probably dependant on what you are doing with it. For example you may want to get a list of Application nodes to enumerate over with a foreach by doing this xml.SelectNodes("//Application").
If you are having trouble with extdacting stuff though that is probably the scope of a different question since this was just about how to get the XML document loaded.
xml.DocumentElement["proxyAgent"].InnerText;
The proxyAgent element is self closing. InnerText will return the string inside of an XML element, in this case, there is no inner elements.
You need to access an attribute of the element, not the InnerText.
Try this:
string port = xml.GetElementsByTagName("ProxyAgent")[0].Attributes["listenPort"].ToString();
Or use Linq to XML:
http://msdn.microsoft.com/en-us/library/bb387098.aspx
And... your XML is not malformed...

Append xml document to bottom of existing xml doc

I have an xml document and want to append another xml at the bottom of it. Using the xml classes in .NET, what is the quickest way to do this (in 3.5)?
Thanks
Quickest as in most efficient, or quickest as in simplest? For example:
XDocument doc1 = XDocument.Load(...);
XDocument doc2 = XDocument.Load(...);
// Copy the root element of doc2 to the end of doc1
doc1.Root.Add(doc2.Root);
doc1.Save(...);
Alternatively, you may want:
// Copy the *contents* of the root element of doc2 to the end of doc1
doc1.Root.Add(doc2.Root.Descendants());
If you can be more precise about your requirements, we may be able to help more. Note that an XML document can only have one root element, so you can't just put one document after another.
I doubt that you will be able to do this using the XML classes. XML libraries typically aim to protect you from creating poorly-formed XML, and the concatenation of two XML documents will be poorly formed because the document node will have two child elements.
If the .Net libraries do allow you to do this, I suggest you raise it as a bug.
var xml = new XmlDocument();
xml.AppendChild(...);
xml.PrependChild(...);
If you really want to add a second root node the fastest way would be to read the first file line by line and add it to the second file. That's a very dirty way and you'll get an invalid xml file!
System.IO.StreamWriter file1 = System.IO.File.AppendText(path);
System.IO.StreamReader file2 = new System.IO.StreamReader(path2)
while(!file2.EndOfStream)
{
file1.WriteLine(file2.ReadLine());
}
file1.Close();
file2.Close();
I even don't like this solution!

How make XMLDocument do not put spaces on self-closed tags?

I have an XML well formatted without any spaces. It' must be like that.
When I load it to XMLDocument to sign, the self-closing tags gets an extra white space and
<cEAN/>
becomes:
<cEAN />
Once this document must be signed, it's impossible to remove the white space.
The property PreserveWhiteSpace doesn't made any difference to the result.
How can I change this behavior?
There is no space before the closing "/" in the XmlDocument. XmlDocument is a data structure consisting of nodes. It is binary. It is not text.
Any extra space you are seeing exists only when you serialize the document as text.
Are you actually having a problem with signing, or do you only think you will have such a problem?
I have had this problem before. XML signed by a basic Hash so it can't change when serialized. I solved it by writing a serializer so that I could be sure that it would output the correct XML.
The basic recipe is to Read the XML with a XMLReader, and write out each chunk as it comes.
Try this:
XMLDocument doc;
...
string XMLstring = doc.OuterXml.Replace(" />","/>");

Categories