XML formatting for later parsing in c#

XML formatting for later parsing in c# - c#

I'm using XML to hold some settings, so let's go with the following example:
<PermissionLevels>
<Permission Name="MyPermission 1" Bases="XXX" />
<Permission Name="MyPermission 2" Bases="XXX" />
</PermissionLevels>
So the XXX is my problem. I will have several base permissions, e.g. "Read, Create, Delete", and I know C# has the .Split method, so I could probably try it with "," as a delimiter. I could also have child nodes to <Permission>.
Is there a correct way to do it? With correct I mean the following: I'm not sure whether I should use LINQ (XDocument) or the regular System.Xml variant (XmlDocument) and I am not sure if some different way might be better to handle.
With .Split I will have to take the node.Attributes["Base"].Value, split it and do a foreach for all elements in the array - maybe LINQ is smarter and can do something differently?
Sorry for my rambling, this is just not my domain. The question actually is: When I have multiple values for one attribute, do I concatenate them with e.g. a ",", or do I do it differently (e.g. with child nodes of the object)?

I would prefer child nodes. Attributes are meant for scalar (single) values.
Also, this way your source code in C# with LINQ will be simple and that's good.

Related

XML Schema - Which Is More Correct?

supposing O have these two equivalent (at least they are supposed to be) XML schemas. The actual XML will eventually be parsed by C#. I think the second way is 'more correct' since I will get attributes as actual attrbutes, instead of child elements, correct?
<?xml version="1.0" encoding="ISO-8859-1"?>
<switch>
<switch_name>switch1</switch_name>
<software_version>1</software_version>
<vendor>Cisco</vendor>
<ip_address>1.1.1.1</ipaddress>
<linecard>
<model_type>12345</model_type>
<fcport>
<slot> 1</slot>
<port> 1</port>
<speed>4</speed>
</fcport>
</linecard>
</switch>
<switch>
<switch name="switch1" version="1" vendor="Cisco" ip_address="1.1.1.1">
<linecard model="12345">
<fcport slot="1" port="1" speed="4">
</fcport>
<linecard>
</switch>
</xml>

Neither one is strictly more "correct" than the other, both will work for your example. Neither breaks any rules.
That said, I think I agree with W3Schools on this one, in that data should go inside child elements rather than attributes. Especially things like IP addresses just FEEL like data that should be a child element rather than an attribute. Attributes I typically use for metadata, such as auto generated IDs.
This is especially true if you later want to account for expansion -- for example, what if you want to associate multiple IPs? With child elements you can just add another element, but with attributes you have to come up with a new attribute name for each addition (ip1, ip2, ip3...).

There are no "right" way of representing data in XML when choosing between using elements or attributes for properties of an entity. Choose whatever works for you.
Generally elements give more freedom as you may have sub-elements eventually. I.e. if property is list of some sort representing it as comma-separated value in attribute looks very non-XML.
Side note: "XML schema" usually means different thing - structured schema for XML... what you have I'd call "representation of data in XML".

The classic article on how to choose between elements and attributes is here:
http://xml.coverpages.org/elementsAndAttrs.html
I note that at the end of the page it quotes John Cowan quoting me: "Beginnners always ask this question. Those with a little experience express their opinions passionately. Experts tell you there is no right answer'."

Writing xml with at most two tags per line

I am saving xml from .NET's XElement. I've been using the method ToString, but the formatting doesn't look how I'd like (examples below). I'd like at most two tags per line. How can I achieve that?
Saving XElement.Parse("<a><b><c>one</c><c>two</c></b><b>three<c>four</c><c>five</c></b></a>").ToString() gives me
<a>
<b>
<c>one</c>
<c>two</c>
</b>
<b>three<c>four</c><c>five</c></b>
</a>
But for readability I would rather 'three', 'four' and 'five' were on separate lines:
<a>
<b>
<c>one</c>
<c>two</c>
</b>
<b>three
<c>four</c>
<c>five</c>
</b>
</a>
Edit: Yes I understand this is syntactically different and "not in the spirit of xml", but I'm being pragmatic. Recently I've seen megabyte-size xml files with as few as 3 lines—these are challenging to text editors, source control, and diff tools. Something needs to be done! I've tested that changing the formatting above is compatible with our application.

If you want exactly that output, you'll need to do it manually, adding whitespace around nodes as necessary.
Almost all whitespace in XML documents is significant, even if we only think of it as indenting. When we ask the serializer to indent the document for us, it is making changes to the content that can get extracted, so they try to be as conservative as possible. The elements
<tag>foo</tag>
and
<tag>
foo
</tag>
have different content, and if an serializer changed the former into the latter, it would change what you get back from your XML API when asking for the contents of <tag>.
The usual rule of thumb is that no indenting will be applied if there's any existing non-whitespace between the elements. In this case, your three between the tags would be modified if a serializer applied the indenting you desire, so nothing will do it for you automatically.
If you have control over the XML format, it's inadvisable to mix element and text children like this, where <b> has both text (three) and element (<c>) children, as it causes issues like what you're seeing.

The formatting isn't working the way you want because of the naked "three". Is there a reason it's not in it's own tag? Should it be an attribute of "b" instead?

Explained reasons to colleagues - we're going to change the file format. I recommend you try to do the same. It's nigh impossible to do what I wanted, because most xml tools assume whitespace is significant.

XML is an information exchange format, intended for computers. The whitespace is irrelevant (depending on location and schema, really) and as such, it would be arbitrary to use one or the other.
You could use XmlTextWriter with XElement.Save and see whether you can tweak it to your liking with the XmlWriter.Settings Property

I've had to do something similar before (for a client request). All I ended up doing was writing a custom .ToString() method only used for either displaying the XML in a browser(ugh, i know) or for their use in downloading an xml file of the content. Because the code did not have to be computationally efficient, it was merely a matter of checking the children of each tag and arranging the 'hanging' text as such.
Eventually we were able to convince the user that the text should be an attribute instead.

Get list of XSD elements in C#

I have an XSD file and want to get a list of the names of all the elements in it. I don't mean stuff like <xs:sequence> and so on, just the "real stuff", that actually can appear in XML that are valid according to the XSD.

Real stuff is a bit vague
But if you just want want all elements it's just a it of Xpath.
If you want a tree, then you can't avoid sequence etc.
If you have things like xs:choice in there you have even more issues.
Then there's attributes...
From SimpleContent or ComplexType...
Might be easier to generate a 'blank' xml document from the xsd and then get what you want out of that. That's a fair chunk of code as well though. Might be one lying around you can borrow though.

If you don't actually want to do this from your code, you could use the XML Schema Definition Tool (Xsd.exe) to create source code for runtime objects.
From there you can use Xml serialization to create valid Xml samples for your given Xsd schema.

Since you're trying to code for this, I would assume you want to do this against different XML Schema files, over and over; if true, it would be then important to understand if you really have to embed this in your codebase, or if it can be used as an external tool.
If you really want to do it, most of all you need is in System.Xml.Schema package. Start with an XmlSchemaSet to load and compile your XSD files. Then using an iterator on GlobalElements, go over the global elements that can show as your root elements in XML document and traverse those (for what you need, use the PSVI properties); as someone else was mentioning, there will be types to go through, compositors, etc.; and then there's more: abstract elements (those can't show up in XML, neither references to abstract elements, instead members of substitution groups), prohibited attributes, restricted types, etc.
I've recently answered another post that may be related to your need; your posted XML Schema may look like this:
root/ship/engine/#MaxSpeed,A,1..1,True
root/ship/crew/#function,A,1..1,True
root/ship/#Name,A,1..1,True
root/ship/#class,A,1..1,True
root/ship/special_abilities/hull/#separable,A,0..1,False
root/ship/special_abilities/hull/#canCarryWesley,A,0..1,False
root/ship/special_abilities/hull/#capableOfLanding,A,0..1,False
If you want, you can deal only with the first column; the generated XPath shows only those items (elements or attributes) that have data; processing something like the above might be much easier (split the string using /, elements are all but #, etc.)

replacing substring inside attributes of XmlDocument

I'm using C# with .net 3.5 and have a few cases where I want to replace some substrings in the XML attributes of an XmlDocument with something else.
One case is to replace the single quote character with ' and the other is to clean up some files that contain valid XML but the attributes' values are no longer appropriate (say replace anything attribute which starts with "myMachine" with "newMachine").
Is there a simple way to do this, or do I need to go through each attribute of every node (recursively)?

One way to approach it is to select a list of the correct elements using Linq to XML, and then iterate over that list. Here's an example one-liner:
XDocument doc = XDocument.Load(path);
doc.XPathSelectElements("//element[#attribute-name = 'myMachine']").ToList().ForEach(x => x.SetAttributeValue("attribute-name", "newMachine"));
You could also do a more traditional iteration.

I suggest taking a look at LINQ to XML. There's a collection of code snippets that can help you get started here - LINQ To XML Tutorials with Examples
LINQ to XML should allow you to do what you're looking to do, and you'll probably find it easy once you've played with it a bit.

Best way to replace XML Text

I have a web service which returns the following XML:
<Validacion>
<Es_Valido>NK7+22XrSgJout+ZeCq5IA==</Es_Valido>
</Validacion>
<Estatus>
<Estatus>dqrQ7VtQmNFXmXmWlZTL7A==</Estatus>
</Estatus>
<Generales>
<Nombre>V4wb2/tq9tEHW80tFkS3knO8i4yTpJzh7Jqi9MxpVVE=</Nombre>
<Apellido>jXyRpjDQvsnzZz+wsq6b42amyTg2np0wckLmQjQx1rCJc8d3dDg6toSdSX200eGi</Apellido>
<Ident_Clie>IYbofEiD+wOCJ+ujYTUxgsWJTnGfVU+jcQyhzgQralM=</Ident_Clie> <Fec_Creacion>hMI2YyE5h2JVp8CupWfjLy24W7LstxgmlGoDYjPev0r8TUf2Tav9MBmA2Xd9Pe8c</Fec_Creacion>
<Nom_Asoc>CF/KXngDNY+nT99n1ITBJJDb08/wdou3e9znoVaCU3dlTQi/6EmceDHUbvAAvxsKH9MUeLtbCIzqpJq74e QfpA==</Nom_Asoc>
<Fec_Defuncion />
</Generales>
The text inside the tags in encrypted, I need to decrypt the text, I've come up with a regular expressions solution but I don't think it's very optimal, is there a better way to do this? thanks!

I wouldn't use a regular expression. Load the XML with something like LINQ to XML, find every element which just has a text child, and replace the contents of that child with the decrypted form.
Do you know which elements will be encrypted? That would make it even easier. Basically you'll want something along the lines of:
// It's possible that modifying elements while executing Descendants()
// would be okay, but I'm not sure
List<XElement> elements = doc.Descendants().ToList();
foreach (XElement element in elements)
{
if (ShouldDecrypt(element)) // Whatever this would do
{
element.Value = Decrypt(element.Value);
}
}
(I'm assuming you know how to do the actual decryption part, of course.)

Never ever use regular expressions to parse XML. XmlReader and XmlDocument, both found inside System.Xml, provide a way better way to parse XML.

Do you know the type of encryption used? Look here to get the basics on the Cryptology capabilities in .NET

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

XML formatting for later parsing in c# - c#

I would prefer child nodes. Attributes are meant for scalar (single) values. Also, this way your source code in C# with LINQ will be simple and that's good.

Related

XML Schema - Which Is More Correct?

Writing xml with at most two tags per line

Get list of XSD elements in C#

replacing substring inside attributes of XmlDocument

Best way to replace XML Text

Categories

Resources