I am writing an XML file using XDocument. I want to make adjustments to the file layout. Let me explain, here is an extract of the generated file:
<ROOTELEMENT>
<CHILDELEMENT>
<INFO1>Test a 1</INFO1>
<INFO2>Test a 2</INFO2>
</CHILDELEMENT>
<CHILDELEMENT>
<INFO1>Test b 1</INFO1>
<INFO2>Test b 2</INFO2>
</CHILDELEMENT>
<ROOTELEMENT>
I want my file to look like this instead :
<ROOTELEMENT>
<CHILDELEMENT><INFO1>Test a 1</INFO1><INFO2>Test a 2</INFO2></CHILDELEMENT>
<CHILDELEMENT><INFO1>Test b 1</INFO1><INFO2>Test b 2</INFO2></CHILDELEMENT>
</ROOTELEMENT>
Here is my code:
var myDoc = new XDocument(new XElement("ROOTELEMENT",
new XElement("CHILDELEMENT",
new XElement("INFO1", "Test a 1"),
new XElement("INFO2", "Test a 2")),
new XElement("CHILDELEMENT",
new XElement("INFO1", "Test b 1"),
new XElement("INFO2", "Test b 2"))));
myDoc.Save("Test.xml");
Formatting of XML output is controlled by XmlWriter not XElement or XDocument, so if you need precise control of formatting you will need to create your own writer by subclassing one of the implementations of XmlWriter, specifically XmlTextWriter whose Formatting property is mutable and can be changed during writing.
For instance, here is a version that disables indentation when the element depth exceeds 1:
public class CustomFormattingXmlTextWriter : XmlTextWriter
{
readonly Stack<Formatting> stack = new Stack<Formatting>();
public CustomFormattingXmlTextWriter(TextWriter writer, int indentDepth) : base(writer)
{
this.Formatting = Formatting.Indented;
this.IndentDepth = indentDepth;
}
int IndentDepth { get; }
void OnElementStarted(string localName, string ns)
{
stack.Push(Formatting);
// You could e.g. modify the logic here to check to see whether localName == CHILDELEMENT
// if (localName == "CHILDELEMENT")
if (stack.Count == IndentDepth+1)
Formatting = Formatting.None;
}
void OnElementEnded()
{
var old = stack.Pop();
if (old != Formatting)
Formatting = old;
}
public override void WriteStartElement(string prefix, string localName, string ns)
{
base.WriteStartElement(prefix, localName, ns);
OnElementStarted(localName, ns);
}
public override void WriteEndElement()
{
base.WriteEndElement();
OnElementEnded();
}
public override void WriteFullEndElement()
{
base.WriteEndElement();
OnElementEnded();
}
}
public static partial class XNodeExtensions
{
public static void SaveWithCustomFormatting(this XDocument doc, string filename, int indentDepth)
{
using (var textWriter = new StreamWriter(filename))
using (var writer = new CustomFormattingXmlTextWriter(textWriter, indentDepth))
{
doc.Save(writer);
}
}
}
Using it, you can do:
myDoc.SaveWithCustomFormatting(fileName, 1);
which outputs, as required:
<ROOTELEMENT>
<CHILDELEMENT><INFO1>Test a 1</INFO1><INFO2>Test a 2</INFO2></CHILDELEMENT>
<CHILDELEMENT><INFO1>Test b 1</INFO1><INFO2>Test b 2</INFO2></CHILDELEMENT>
</ROOTELEMENT>
Notes:
You can modify the logic in CustomFormattingXmlTextWriter.OnElementStarted() to disable the formatting using any criteria you desire such as checking to see whether the incoming localName is CHILDELEMENT.
XmlTextWriter is deprecated in favor of XmlWriter -- but the latter does not have a mutable Formatting property. If you must needs use XmlWriter you might look at Is there a way to serialize multiple XElements onto the same line?.
Demo fiddle #1 here.
Related
I'm using this code to convert from an XElement to OpenXmlElement
internal static OpenXmlElement ToOpenXml(this XElement xel)
{
using (var sw = new StreamWriter(new MemoryStream()))
{
sw.Write(xel.ToString());
sw.Flush();
sw.BaseStream.Seek(0, SeekOrigin.Begin);
var re = OpenXmlReader.Create(sw.BaseStream);
re.Read();
var oxe = re.LoadCurrentElement();
re.Close();
return oxe;
}
}
Before the conversion I have an XElement
<w:ind w:firstLine="0" w:left="0" w:right="0"/>
After the conversion it looks like this
<w:ind w:firstLine="0" w:end="0" w:start="0"/>
This element then fails OpenXml validation using the following
var v = new OpenXmlValidator();
var errs = v.Validate(doc);
With the errors being reported:
Description="The 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:start' attribute is not declared."
Description="The 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:end' attribute is not declared."
Do I need to do other things to add these attributes to the schema or do I need to find a new way to convert from XElement to OpenXml?
I'm using the nuget package DocumentFormat.OpenXml ver 2.9.1 (the latest).
EDIT: Looking at the OpenXml standard, it seems that both left/start and right/end should be recognised which would point to the OpenXmlValidator not being quite correct. Presumably I can just ignore those validation errors then?
Many thx
The short answer is that you can indeed ignore those specific validation errors. The OpenXmlValidator is not up-to-date in this case.
I would additionally offer a more elegant implementation of your ToOpenXml method (note the using declarations, which were added in C# 8.0).
internal static OpenXmlElement ToOpenXmlElement(this XElement element)
{
// Write XElement to MemoryStream.
using var stream = new MemoryStream();
element.Save(stream);
stream.Seek(0, SeekOrigin.Begin);
// Read OpenXmlElement from MemoryStream.
using OpenXmlReader reader = OpenXmlReader.Create(stream);
reader.Read();
return reader.LoadCurrentElement();
}
If you don't use C# 8.0 or using declarations, here's the corresponding code with using statements.
internal static OpenXmlElement ToOpenXmlElement(this XElement element)
{
using (var stream = new MemoryStream())
{
// Write XElement to MemoryStream.
element.Save(stream);
stream.Seek(0, SeekOrigin.Begin);
// Read OpenXmlElement from MemoryStream.
using OpenXmlReader reader = OpenXmlReader.Create(stream);
{
reader.Read();
return reader.LoadCurrentElement();
}
}
}
Here's the corresponding unit test, which also demonstrates that you'd have to pass a w:document to have the w:ind element's attributes changed by the Indentation instance created in the process.
public class OpenXmlReaderTests
{
private const string NamespaceUriW = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
private static readonly string XmlnsW = $"xmlns:w=\"{NamespaceUriW}\"";
private static readonly string IndText =
$#"<w:ind {XmlnsW} w:firstLine=""10"" w:left=""20"" w:right=""30""/>";
private static readonly string DocumentText =
$#"<w:document {XmlnsW}><w:body><w:p><w:pPr>{IndText}</w:pPr></w:p></w:body></w:document>";
[Fact]
public void ConvertingDocumentChangesIndProperties()
{
XElement element = XElement.Parse(DocumentText);
var document = (Document) element.ToOpenXmlElement();
Indentation ind = document.Descendants<Indentation>().First();
Assert.Null(ind.Left);
Assert.Null(ind.Right);
Assert.Equal("10", ind.FirstLine);
Assert.Equal("20", ind.Start);
Assert.Equal("30", ind.End);
}
[Fact]
public void ConvertingIndDoesNotChangeIndProperties()
{
XElement element = XElement.Parse(IndText);
var ind = (OpenXmlUnknownElement) element.ToOpenXmlElement();
Assert.Equal("10", ind.GetAttribute("firstLine", NamespaceUriW).Value);
Assert.Equal("20", ind.GetAttribute("left", NamespaceUriW).Value);
Assert.Equal("30", ind.GetAttribute("right", NamespaceUriW).Value);
}
}
My goal is to output a modified XML file and preserve a special indentation that was present in the original file. The objective is so that the resulting file still looks like the original, making them easier to compare and merge through source control.
My program will read a XML file and add or change one specific attribute.
Here is the formatting I'm trying to achieve / preserve:
<Base Import="..\commom\style.xml">
<Item Width="480"
Height="500"
VAlign="Center"
Style="level1header">
(...)
In this case, I simply wish to align all attributes past the first one with the first one.
XmlWriterSettings provides formatting options, but they won't achieve the result I'm looking for.
settings.Indent = true;
settings.NewLineOnAttributes = true;
These settings will put the first attribute on a newline, instead of keeping it on the same line as the node, and will line up attributes with the node.
Here is the Load call, which asks to preserve whitespace:
MyXml = XDocument.Load(filepath, LoadOptions.PreserveWhitespace);
But it seems like it won't do what I expected.
I tried to provide a custom class, which derives from XmlWriter to the XDocument.Save call, but I haven't managed to insert whitespace correctly without running into InvalidOperationException. Plus that solution seems overkill for the small addition I'm looking for.
For reference, this is my save call, not using my custom xml writer (which doesn't work anyway)
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.NewLineOnAttributes = true;
settings.OmitXmlDeclaration = true;
using (XmlWriter writer = XmlWriter.Create(filepath + "_auto", settings))
{
MyXml.Save(writer);
}
I ended up not using XDocument.Save altogether, and instead created a class that takes the XDocument, an XmlWriter, as well as a TextWriter.
The class parses all nodes in XDocument, TextWriter is bound to the file on disk, which XmlWriter uses as its output pipe.
My class then uses the XmlWriter to output xml. To achieve the extra spacing, I used the solution described here, https://stackoverflow.com/a/24010544/5920497 , which is why I also use the underlying TextWriter.
Here's an example of the solution.
Calling the class to save the document:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.NewLineOnAttributes = false; // Behavior changed in PrettyXmlWriter
settings.OmitXmlDeclaration = true;
using(TextWriter rawwriter = File.CreateText(filepath))
using (XmlWriter writer = XmlWriter.Create(rawwriter, settings))
{
// rawwriter is used both by XmlWriter and PrettyXmlWriter
PrettyXmlWriter outputter = new PrettyXmlWriter(writer, rawwriter);
outputter.Write(MyXml);
writer.Flush();
writer.Close();
}
Inside PrettyXmlWriter:
private XmlWriter Writer { get; set; }
private TextWriter InnerTextWriter { get; set; }
public void Write(XDocument doc)
{
XElement root = doc.Root;
WriteNode(root, 0);
}
private void WriteNode(XNode node, int currentNodeDepth)
{
if(node.NodeType == XmlNodeType.Element)
{
WriteElement((XElement)node, currentNodeDepth);
}
else if(node.NodeType == XmlNodeType.Text)
{
WriteTextNode((XText)node, currentNodeDepth, doIndentAttributes);
}
}
private void WriteElement(XElement node, int currentNodeDepth)
{
Writer.WriteStartElement(node.Name.LocalName);
// Write attributes with indentation
XAttribute[] attributes = node.Attributes().ToArray();
if(attributes.Length > 0)
{
// First attribute, unindented.
Writer.WriteAttributeString(attributes[0].Name.LocalName, attributes[0].Value);
for(int i=1; i<attributes.Length; ++i)
{
// Write indentation
Writer.Flush();
string indentation = Writer.Settings.NewLineChars + string.Concat(Enumerable.Repeat(Writer.Settings.IndentChars, currentNodeDepth));
indentation += string.Concat(Enumerable.Repeat(" ", node.Name.LocalName.Length + 1));
// Using Underlying TextWriter trick to output whitespace
InnerTextWriter.Write(indentation);
Writer.WriteAttributeString(attributes[i].Name.LocalName, attributes[i].Value);
}
}
// output children
foreach(XNode child in node.Nodes())
{
WriteNode(child, currentNodeDepth + 1);
}
Writer.WriteEndElement();
}
Why does XmlSerializer populate my object property with an XmlNode array when deserializing an empty typed element using XmlNodeReader instead of an empty string like it does when using StringReader (or XmlTextReader)?
The second assertion in the following code sample fails:
var doc = new XmlDocument();
doc.Load(new StringReader(#"
<Test xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance""
xmlns:xsd=""http://www.w3.org/2001/XMLSchema"">
<Value xsi:type=""xsd:string"" />
</Test>"));
var ser = new XmlSerializer(typeof (Test));
var reader1 = new StringReader(doc.InnerXml);
var obj1 = (Test) ser.Deserialize(reader1);
Debug.Assert(obj1.Value is string);
var reader2 = new XmlNodeReader(doc.FirstChild);
var obj2 = (Test) ser.Deserialize(reader2);
Debug.Assert(obj2.Value is string);
public class Test
{
public object Value { get; set; }
}
I'm guessing it has something to do with the null internal NamespaceManager property but I'm not sure how to work around this mysterious limitation. How can I reliably deserialize a subset of my parsed XML document without converting it back into a string and re-parsing?
It looks like this is a very old XmlNodeReader bug that Microsoft have no intention of fixing. (Archived Microsoft Connect link here). I found a workaround on Lev Gimelfarb's blog here that adds namespaces to the reader's NameTable as prefixes are looked up.
public class ProperXmlNodeReader : XmlNodeReader
{
public ProperXmlNodeReader(XmlNode node) : base(node)
{
}
public override string LookupNamespace(string prefix)
{
return NameTable.Add(base.LookupNamespace(prefix));
}
}
I can read and write a XML File into a Dataset with no problems, but if I save the DataSet with ds.WriteXml("Testdata.xml") a additional Tag <NewDataSet>.......</NewDataSet> is generrated.
Is it possible to supress this Tag genaration?
A dataset can contain multiple tables and a valid XML file must contain a single root node that's why it is wrapped in this node. You could specify the name of the root node when creating the DataSet:
var ds = new DataSet("root");
but if you want to remove it you could first load the DataSet into a XDocument or XmlDocument and then extract the node you need and save it into a file.
I was looking to do the same thing in order to stream serialized DataRows without the overhead of tables and such. My solution was to use a temp DataTable as a sort of buffer which I fill with DataRows in chunks and then generate XML that I append to a stream. In order to reuse DataTable.WriteXml I had to solve the same problem and I wanted it to be efficient.
What I opted for is to create my own custom XmlWriter which is a lot simpler than it sounds. It works by skipping over elements that meet the condidtion of a predicate. In this case the predicate is whether the element name is the same as the expected DataSet name.
var writer = new RootlessDataSetXmlWriter(
File.OpenWrite(#"C:\temp\ds.xml")
"YourDataSetName");
dataSet.WriteXml(writer, XmlWriteMode.IgnoreSchema);
writer.Flush();
writer.Close();
Below is the implementation for RootlessDataSetXmlWriter and the ELementSkippingXmlWritter base class.
public class RootlessDataSetXmlWriter : ElementSkippingXmlWriter
{
private string _dataSetName;
public RootlessDataSetXmlWriter(Stream stream, string dataSetName)
: base(stream, (e) => string.Equals(e, dataSetName, StringComparison.OrdinalIgnoreCase))
{
_dataSetName = dataSetName;
this.Formatting = System.Xml.Formatting.Indented;
}
}
public class ElementSkippingXmlWriter : XmlTextWriter
{
private Predicate<string> _elementFilter;
private int _currentElementDepth;
private Stack<int> _sightedElementDepths;
public ElementSkippingXmlWriter(Stream stream, Predicate<string> elementFilter)
: base(stream, Encoding.UTF8)
{
_elementFilter = elementFilter;
_sightedElementDepths = new Stack<int>();
}
public override void WriteStartElement(string prefix, string localName, string ns)
{
if (_elementFilter(localName))
{
// Skip the root elements
_sightedElementDepths.Push(_currentElementDepth);
}
else
{
base.WriteStartElement(prefix, localName, ns);
}
_currentElementDepth++;
}
public override void WriteEndElement()
{
_currentElementDepth--;
if (_sightedElementDepths.Count > 0 && _sightedElementDepths.Peek() == _currentElementDepth)
{
_sightedElementDepths.Pop();
return;
}
base.WriteEndElement();
}
}
For those "efficient" like me, here are the simple XDocument and XmlDocument codez (live examples at https://dotnetfiddle.net/s2VP0k). Though XmlDocument was typically faster in my simple tests, I prefer the XDocument's simplicity.
XDocument:
return System.Xml.Linq.XElement
.Parse(xmlString)
.FirstNode
.ToString(); // <c><a>123</a></c>
XmlDocument:
var xmlDoc = new System.Xml.XmlDocument();
xmlDoc.LoadXml(xmlString);
return xmlDoc.DocumentElement.InnerXml; // <c><a>123</a></c>
I have this code:
public class Hero
{
XmlReader Reader = new XmlTextReader("InformationRepositories/HeroRepository/HeroInformation.xml");
XmlReaderSettings XMLSettings = new XmlReaderSettings();
public ImageSource GetHeroIcon(string Name)
{
XMLSettings.IgnoreWhitespace = true;
XMLSettings.IgnoreComments = true;
Reader.MoveToAttribute(" //I'm pretty much stuck here.
}
}
And this is the XML file I want to read from:
<?xml version="1.0" encoding="utf-8" ?>
<Hero>
<Legion>
<Andromeda>
<HeroType>Agility</HeroType>
<Damage>39-53</Damage>
<Armor>3.1</Armor>
<MoveSpeed>295</MoveSpeed>
<AttackType>Ranged(400)</AttackType>
<AttackRate>.75</AttackRate>
<Strength>16</Strength>
<Agility>27</Agility>
<Intelligence>15</Intelligence>
<Icon>Images/Hero/Andromeda.gif</Icon>
</Andromeda>
</Legion>
<Hellbourne>
</Hellbourne>
</Hero>
I'm tring to get the ,/Icon> element.
MoveToAttribute() won't help you, because everything in your XML is elements. The Icon element is a subelement of the Andromeda element.
One of the easiest ways of navigating an XML document if you're using the pre-3.5 xml handling is by using an XPathNavigator. See this example for getting started, but basically you just need to create it and call MoveToChild() or MoveToFollowing() and it'll get you to where you want to be in the document.
XmlDocument doc = new XmlDocument();
doc.Load("InformationRepositories/HeroRepository/HeroInformation.xml");
XPathNavigator nav = doc.CreateNavigator();
if (nav.MoveToFollowing("Icon",""))
Response.Write(nav.ValueAsInt);
Note that an XPathNavigator is a forward only mechanism, so it can be problematic if you need to do looping or seeking through the document.
If you're just reading XML to put the values into objects, you should seriously consider doing this automatically via object serialization to XML. This would give you a painless and automatic way to load your xml files back into objects.
Mark your attributes in your object according to the element you want to load them to:
See: http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlattributeattribute.aspx
If, for some reason, you can't do this to your current object, consider making a bridge object which mirrors your original object and add a AsOriginal() method which returns the Original Object.
Working off the msdn example:
public class GroupBridge
{
[XmlAttribute (Namespace = "http://www.cpandl.com")]
public string GroupName;
[XmlAttribute(DataType = "base64Binary")]
public Byte [] GroupNumber;
[XmlAttribute(DataType = "date", AttributeName = "CreationDate")]
public DateTime Today;
public Group AsOriginal()
{
Group g = new Group();
g.GroupName = this.GroupName;
g.GroupNumber = this.GroupNumber;
g.Today = this.Today;
return g;
}
}
public class Group
{
public string GroupName;
public Byte [] GroupNumber;
public DateTime Today;
}
To Serialize and DeSerialize from LINQ objects, you can use:
public static string SerializeLINQtoXML<T>(T linqObject)
{
// see http://msdn.microsoft.com/en-us/library/bb546184.aspx
DataContractSerializer dcs = new DataContractSerializer(linqObject.GetType());
StringBuilder sb = new StringBuilder();
XmlWriter writer = XmlWriter.Create(sb);
dcs.WriteObject(writer, linqObject);
writer.Close();
return sb.ToString();
}
public static T DeserializeLINQfromXML<T>(string input)
{
DataContractSerializer dcs = new DataContractSerializer(typeof(T));
TextReader treader = new StringReader(input);
XmlReader reader = XmlReader.Create(treader);
T linqObject = (T)dcs.ReadObject(reader, true);
reader.Close();
return linqObject;
}
I don't have any example code of Serialization from non-LINQ objects, but the MSDN link should point you in the right direction.
You can use linq to xml:
public class XmlTest
{
private XDocument _doc;
public XmlTest(string xml)
{
_doc = XDocument.Load(new StringReader(xml);
}
public string Icon { get { return GetValue("Icon"); } }
private string GetValue(string elementName)
{
return _doc.Descendants(elementName).FirstOrDefault().Value;
}
}
you can use this regular exspression "<Icon>.*</Icon>" to find all the icons
then just remove the remove the tag, and use it....
would be a lot shorter
Regex rgx = new Regex("<Icon>.*</Icon>");
MatchCollection matches = rgx.Matches(xml);
foreach (Match match in matches)
{
string s= match.Value;
s= s.Remove(0,6)
s= s.Remove(s.LastIndexOf("</Icon>"),7);
console.Writeline(s);
}