Replace OuterXml OR generate intended string from InnerXml

Replace OuterXml OR generate intended string from InnerXml - c#

I have a UI that uses the DataGridView to display the content of XML files.
If XmlNode contains only InnerText, it's quite simple, however I'm having a problem with nodes that contains childnodes (and not only string).
Simple
<node>value</node>
Displayed as "value" in DataGridViewCell.
Complex
<node>
<foo>bar</foo>
<foo2>bar</foo2>
</node>
The problem is that the InnerXml code is not intended and it's very hard to modify in UI.
I've tried to use XmlTextWriter to "beautify" the string - it works quite well, however requires a XmlNode (includes node, not only childnodes) and I cannot assign it back to InnerXml.
I would like to either see following in the UI:
<foo>bar</foo>
<foo2>bar</foo2>
(this can be assigned to InnerXml afterwards)
Or
<node>
<foo>bar</foo>
<foo2>bar</foo2>
</node>
(and find a way how to replace OuterXml with this string).
Thanks for any ideas,
Martin

You can load the OuterXml to XElement, then use String.Join() to join all child elements of the root node (in other point-of-view, the InnerXml) separated by line break, for example :
XElement e = e.Parse(something.OuterXml);
var result = string.Join(
Environment.NewLine,
e.Elements().Select(o => o.ToString())
);

Related

Remove empty XML tags

I am looking for a good approach that can remove empty tags from XML efficiently. What do you recommend? Regex? XDocument? XmlTextReader?
For example,
const string original =
#"<?xml version=""1.0"" encoding=""utf-16""?>
<pet>
<cat>Tom</cat>
<pig />
<dog>Puppy</dog>
<snake></snake>
<elephant>
<africanElephant></africanElephant>
<asianElephant>Biggy</asianElephant>
</elephant>
<tiger>
<tigerWoods></tigerWoods>
<americanTiger></americanTiger>
</tiger>
</pet>";
Could become:
const string expected =
#"<?xml version=""1.0"" encoding=""utf-16""?>
<pet>
<cat>Tom</cat>
<dog>Puppy</dog>
<elephant>
<asianElephant>Biggy</asianElephant>
</elephant>
</pet>";

Loading your original into an XDocument and using the following code gives your desired output:
var document = XDocument.Parse(original);
document.Descendants()
.Where(e => e.IsEmpty || String.IsNullOrWhiteSpace(e.Value))
.Remove();

This is meant to be an improvement on the accepted answer to handle attributes:
XDocument xd = XDocument.Parse(original);
xd.Descendants()
.Where(e => (e.Attributes().All(a => a.IsNamespaceDeclaration || string.IsNullOrWhiteSpace(a.Value))
&& string.IsNullOrWhiteSpace(e.Value)
&& e.Descendants().SelectMany(c => c.Attributes()).All(ca => ca.IsNamespaceDeclaration || string.IsNullOrWhiteSpace(ca.Value))))
.Remove();
The idea here is to check that all attributes on an element are also empty before removing it. There is also the case that empty descendants can have non-empty attributes. I inserted a third condition to check that the element has all empty attributes among its descendants. Considering the following document with node8 added:
<root>
<node />
<node2 blah='' adf='2'></node2>
<node3>
<child />
</node3>
<node4></node4>
<node5><![CDATA[asdfasdf]]></node5>
<node6 xmlns='urn://blah' d='a'/>
<node7 xmlns='urn://blah2' />
<node8>
<child2 d='a' />
</node8>
</root>
This would become:
<root>
<node2 blah="" adf="2"></node2>
<node5><![CDATA[asdfasdf]]></node5>
<node6 xmlns="urn://blah" d="a" />
<node8>
<child2 d='a' />
</node8>
</root>
The original and improved answer to this question would lose the node2 and node6 and node8 nodes. Checking for e.IsEmpty would work if you only want to strip out nodes like <node />, but it's redunant if you're going for both <node /> and <node></node>. If you also need to remove empty attributes, you could do this:
xd.Descendants().Attributes().Where(a => string.IsNullOrWhiteSpace(a.Value)).Remove();
xd.Descendants()
.Where(e => (e.Attributes().All(a => a.IsNamespaceDeclaration))
&& string.IsNullOrWhiteSpace(e.Value))
.Remove();
which would give you:
<root>
<node2 adf="2"></node2>
<node5><![CDATA[asdfasdf]]></node5>
<node6 xmlns="urn://blah" d="a" />
</root>

As always, it depends on your requirements.
Do you know how the empty tag will display? (e.g. <pig />, <pig></pig>, etc.) I usually do not recommend using Regular Expressions (they are really useful but at the same time they are evil). Also considering a string.Replace approach seems to be problematic unless your XML doesn't have a certain structure.
Finally, I would recommend using an XML parser approach (make sure your code is valid XML).
var doc = XDocument.Parse(original);
var emptyElements = from descendant in doc.Descendants()
where descendant.IsEmpty || string.IsNullOrWhiteSpace(descendant.Value)
select descendant;
emptyElements.Remove();

Anything you use will have to pass through the file once at least. If its just a single named tag that you know then regex is your friend otherwise use a stack approach. Start with parent tag and if it has a sub tag place it in stack. If you find an empty tag remove it then once you have gone through child tags and reached the ending tag of what you have on top of stack then pop it and check it as well. If its empty remove it as well. This way you can remove all empty tags including tags with empty children.
If you are after a reg ex expression use this

XDocument is probably simplest to implement, and will give adequate performance if you know your documents are reasonably small.
XmlTextReader will be faster and use less memory than XDocument when processing very large documents.
Regex is best for handling text rather than XML. It might not handle all edge cases as you would like (e.g. a tag within a CDATA section; a tag with an xmlns attribute), so is probably not a good idea for a general implementation, but may be adequate depending on how much control you have of the input XML.

XmlTextReader is preferable if we are talking about performance (it provides fast, forward-only access to XML). You can determine if tag is empty using XmlReader.IsEmptyElement property.
XDocument approach which produces desired output:
public static bool IsEmpty(XElement n)
{
return n.IsEmpty
|| (string.IsNullOrEmpty(n.Value)
&& (!n.HasElements || n.Elements().All(IsEmpty)));
}
var doc = XDocument.Parse(original);
var emptyNodes = doc.Descendants().Where(IsEmpty);
foreach (var emptyNode in emptyNodes.ToArray())
{
emptyNode.Remove();
}

XmlElement to string conversion

Is there some simple way to convert XmlElement to string ?

This will get the content of the element if the content is text:
element.Value
This will get the content of the element as XML:
element.InnerXml
This will get the element and its content as XML
element.OuterXml

You can look at the Value or InnerText properties of the element.
However, without further details of exactly what you are looking, I can't help more.
Update:
Seeing as you want the XML of all nodes, using InnerXml or OuterXml should do nicely.

Let's say you have this XmlElement:
<node>
Hello
<effect color="pink">
World
</effect>
</node>
With Console.Write(xmlElement.Inner) you see the inside of your node:
Hello <effect color="pink">World</effect>
With Console.Write(xmlElement.Outer) you get everything:
<node>Hello <effect color="pink">World</effect></node>
With Console.Write(xmlElement.Value) you get nothing, because Value always returns null for an XML element.

Why doesn't XDocument.Parse() parse my XML properly?

I am trying to use XDocument.Parse(string s) to parse some XML that is being returned from a REST based API. After the XML is parsed, it creates a new XDocument, but the document doesn't contain the properly parsed XML nodes. The name of the first node is the correct node name, but the value is the the concatenation of all the text from the XML, regardless of which Element is belongs to. Can anybody help me figure out what is going on?
XML
<sci_reply version="1.0">
<send_message>
<device id="00000000-00000000-00000000-00000000">
<error id="303">
<desc>Invalid target. Device not found.</desc>
</error>
</device>
<error>Invalid SCI request. No valid targets found.</error>
</send_message>
</sci_reply>
Debug View of XDocument Object

That's the expected behavior. The Value of a an XML element is concatenation of values of all its children. If you want to actually access the XML, read something about LINQ to XML or classes in the System.Xml.Linq namespace.

Thats just the debugger being nice.
The root is being displayed with all of its children.

Get XmlNode Open Tag with Attributes

Is it possible to get the open tag from a XmlNode with all attributes, namespace, etc?
eg.
<root xmlns="urn:..." rattr="a">
<child attr="1">test</child>
</root>
I would like to retrieve the entire opening tag, exactly as retrieved from the original XML document if possible, from the XmlNode and later the closing tag. Both as strings.
Basically XmlNode.OuterXml without the child nodes.
EDIT
To elaborate, XmlNode.OuterXml on a node that was created with the XML above would return the entire XML fragment, including child nodes as a single string.
XmlNode.InnerXml on that same fragment would return the child nodes but not the parent node, again as a single string.
But I need the opening tag for the XML fragment without the children nodes. And without building it using the XmlAttribute array, LocalName, Namespace, etc.
This is C# 3.5
Thanks

Is there some reason you can't simply say:
string s = n.OuterXml.Substring(0, n.OuterXml.IndexOf(">") + 1);

I think the simplest way would be to call XmlNode.CloneNode(false) which (according to the docs) will clone all the attributes but not child nodes. You can then use OuterXml - although that will give you the closing tag as well.
For example:
using System;
using System.Xml;
public class Test
{
static void Main()
{
XmlDocument doc = new XmlDocument();
doc.LoadXml(#"<root xmlns='urn:xyz' rattr='a'>
<child attr='1'>test</child></root>");
XmlElement root = doc.DocumentElement;
XmlNode clone = root.CloneNode(false);
Console.WriteLine(clone.OuterXml);
}
}
Output:
<root xmlns="urn:xyz" rattr="a"></root>
Note that this may not be exactly as per the original XML document, in terms of the ordering of attributes etc. However, it will at least be equivalent.

How about:
xmlNode.OuterXML.Replace(xmlNode.InnerXML, String.Empty);
Poor man's solution :)

Find a node in an XML file - performance improvement in C#

Say I need to find a particular node in an XML file, using C#.
<node attribute="my-target-attribute">
"my-target-attribute" is a variable input at runtime.
The node is in no particular place in the XML file, I basically just need to scan the entire XML hierarchy until I find a node with the matching attribute.
Is there any way I can pre-process the XML so finding the node will be faster? I need to keep the original XML structure in place. The XML file could potentially have 10,000 nodes.

You can certainly preprocess the XML to make lookups faster:
Dictionary<string, XmlElement> elementMap = new Dictionary<string, XmlElement>();
AddElementToMap(doc.DocumentElement, elementMap);
...
private void AddElementToMap(XmlElement elm, Dictionary<string, XmlElement> elementMap)
{
elementMap[elm.GetAttribute("attribute")] = elm;
foreach (XmlElement child in elm.SelectNodes("node"))
{
AddElementToMap(child, elementMap);
}
}
Once you've done this, a lookup is simple:
XmlElement elm = elementMap[value];
That code assumes that every element in your document is named "node", that every one has an attribute named "attribute", and that all of the attribute values are unique. The code's more complicated if any of those conditions are untrue, but not exceptionally so.

You could use xslt to transform the xml so that the node is in a known depth. Then when you select with XPath, you can select accordingly without using the // operator.

With VTD-XML (http://vtd-xml.sf.net) you can index the XML document into VTD+XML,
which eliminate the overhead of parsing
http://www.codeproject.com/KB/XML/VTD-XML-indexing.aspx

similar to another answer, you can use xpath similar like selectNodes("//[#attribute='my-target-attribute']"). // will search nodes in all level depths.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Replace OuterXml OR generate intended string from InnerXml - c#

Related

Remove empty XML tags

XmlElement to string conversion

Why doesn't XDocument.Parse() parse my XML properly?

Get XmlNode Open Tag with Attributes

Find a node in an XML file - performance improvement in C#

Categories

Resources