How do Extract information from a very generic xml document? - c#

My xml document is as follows (but much larger and not anything to do with date of birth info). It is produced by a method and I want to access information from it.
I would like to look up a "Name" like John or Hannah and get DOB from the next line and perhaps put them into a Dictionary<string,string> for later look-up.
The document is ALWAYS in this form and Names are unique. I only need the first section name = "DOBs".
<info>
<section name="DOBs">
<table>
<headers>
<cell value="Name" />
<cell value="DOB" />
</headers>
<row>
<cell value="John" />
<cell value="121245" />
</row>
<row>
<cell value="Hannah" />
<cell value="050595" />
</row>
</table>
</section>
<section name="notDOBs">
<table>
<headers>
<cell value="Name" />
<cell value="other info" />
</headers>
<row>
<cell value="Hannah" />
<cell value="blue" />
</row>
....
</table>
</section>
</info>
What have I tried? - to be honest, not a lot as I'm confused about xml in general. I've looked at a few SO entries: Xml Reader,This one and How to use XMLreader to read this xml. I also looked at csharp-examples.net which looked good but I kept getting errors about "not leading to node set".
I got excited when I found this: get xml attribut values but no joy, it doesn't seem to find them only the ones.
var myXml = XElement.Load(_myPath).Elements();
var myArray = myXml.Elements("cell").Attributes("value").Select(n => n.Value);
I know I have a LOT more work to do but sometimes you just have to bite the bullet and ask ... Can anyone help?

var list = XDocument.Load("C:\\test.xml")
.Descendants("section")
.Where(e => e.Attribute("name").Value.Equals("DOBs"))
.Descendants("row")
.Descendants("cell")
.Attributes("value").ToList();
var dic = Enumerable.Range(0, list.Count/2)
.ToDictionary(i => list[2*i], i => list[2*i + 1]);

Sort XML nodes alphabetically on attribute name
Check the above sample
in navigator try to use
.MoveFirstChild() method
.MoveNext() method
.MoveToFirstAttribute() method
.MoveToNextAttribute() method
.Name property
.Value property
do not forget to check if the current one is null or not

Related

Where are Visio Master Shape properties stored?

I have a program that parses the XML data of a Visio file. In this file there is a grouped shape consisting of several master shapes. Like so:
Each shape has a Property called Pin
Pin 1 is the default value saved in the master shape. When I unzip the Visio file and look at the XML data, the "Pin" property will not show up on pin 1, but it will be there for all the other pins.
<PageContents xmlns="http://schemas.microsoft.com/office/visio/2012/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xml:space="preserve">
<Shapes>
<Shape ID="2010" Type="Group" LineStyle="7" FillStyle="7" TextStyle="7" UniqueID="{B849B0B2-94FC-4CC7-843F-6A55BDBD37E6}">
<Cell N="PinX" V="8.484814432615094"/>
<...etc>
<Section N="Property">
<Row N="REF">
<Cell N="Value" V="X999" U="STR"/>
<...etc>
</Row>
</Section>
<Shapes>
<Shape ID="2" NameU="Pin.1994" IsCustomNameU="1" Name="Pin.1994" IsCustomName="1" Type="Group" Master="126" UniqueID="{216A72DB-F8E9-4C30-9C34-DE9A8448552B}">
<Cell N="PinX" V="0.07874015748031506" F="Sheet.1!Width*0.5"/>
<...etc>
<Shapes>
<Text callout and background shapes>
</Shapes>
</Shape>
<Shape ID="6" NameU="Pin.2002" IsCustomNameU="1" Name="Pin.2002" IsCustomName="1" Type="Group" Master="126">
<Cell N="PinX" V="0.07874015748031506" F="Sheet.1!Width*0.5"/>
<...etc>
<Section N="Property">
<Row N="Pin">
<Cell N="Value" V="2" U="STR"/>
</Row>
</Section>
<Shapes>
<Text callout and background shapes>
</Shapes>
</Shape>
</Shapes>
</Shape>
</Shapes>
</PageContents>
If I rename the "Pin" property to anything other than "1" the property will show up just like it does on Pin 2. I thought this was because it was stored in the Master Shape, but there is no "Property" tag in the master file either.
<MasterContents xmlns="http://schemas.microsoft.com/office/visio/2012/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xml:space="preserve">
<Shapes>
<Shape ID="5" Type="Group" LineStyle="0" FillStyle="0" TextStyle="0" UniqueID="{F811FFC2-FDBC-4EFF-97CF-13F5FFBC677C}">
<Cell N="PinX" V="0"/>
<...etc>
<Section N="User">...</Section>
<Section N="Geometry">...</Section>
<Shapes>
<Shape ID="6" NameU="Text callout" IsCustomNameU="1" Name="Text callout" IsCustomName="1" Type="Group" LineStyle="3" FillStyle="3" TextStyle="3" UniqueID="{4CF654FB-78A6-413C-A551-70A86FC63644}">...</Shape>
</Shapes>
</Shapes>
</MasterContents>
Since Visio is displaying the value it must get the property name and value from somewhere but i have no idea where it does that.
When i parse the file i look for the "Pin" property and extract data from other properties that are in the shape, but when the "Pin" property is not present it will skip all that information for every Pin 1 in the document.
I will attach the complete xml files here if anyone wants to have a look at them.
Property renamed to "1 "
Property missing
Master126
*Edit: Zipfile with all XML files
*Edit2: VSDX file
Thanks for the vsdx, that's helpful.
As you highlighted, the Pin Shape Data row in 'Pin1' shape doesn't show up in the instance shape xml (PageContents) as it is an inherited value from its master. The other two shapes, having local values, are reflected within the instance xml.
I think the problem you're having is you're looking at the wrong master and so not finding the data you're after.
The way to trace this back is if you look at the page xml (page1.xml) you'll see that the Pin shape is based on master id '6':
[Note - I've cut out quite a lot of the xml in the following snippets to give a clearer picture of the structure for the file.]
<PageContents>
<Shapes>
<Shape ID='17' Type='Group' LineStyle='7' FillStyle='7' TextStyle='7'>
<Shapes>
<Shape ID='5' NameU='Pin' Name='Pin' Type='Group' Master='6'>
Now you can look in the masters collection (masters.xml) and will see that the master with an ID attribute of 6 (the 'Pin' master) has a rel id of 'rId2':
<Masters>
<Master ID='2' NameU='Dynamic connector' IsCustomNameU='1' Name='Dynamic connector' IsCustomName='1'>
<Rel r:id='rId1'/>
</Master>
<Master ID='6' NameU='Pin' IsCustomNameU='1' Name='Pin' IsCustomName='1'>
<Rel r:id='rId2'/>
</Master>
Now you've got the correct rel id you and lookup the correct master declaration in master.xml.rels where you'll see that rel id 'rId2' points to master2.xml:
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId3" Type="http://schemas.microsoft.com/visio/2010/relationships/master" Target="master3.xml"/>
<Relationship Id="rId2" Type="http://schemas.microsoft.com/visio/2010/relationships/master" Target="master2.xml"/>
<Relationship Id="rId1" Type="http://schemas.microsoft.com/visio/2010/relationships/master" Target="master1.xml"/>
<Relationship Id="rId5" Type="http://schemas.microsoft.com/visio/2010/relationships/master" Target="master5.xml"/>
<Relationship Id="rId4" Type="http://schemas.microsoft.com/visio/2010/relationships/master" Target="master4.xml"/>
</Relationships>
So your final stop is to head off to master2.xml where you should find that the top level shape (id 5) has a Shape Data row named 'Pin' and a value of '1':
<MasterContents>
<Shapes>
<Shape ID='5' NameU='Pin.473' IsCustomNameU='1' Name='Pin.473' IsCustomName='1' Type='Group'>
<Section N='Property'>
<Row N='Pin'>
<Cell N='Value' V='1' U='STR'/>
<Cell N='Prompt' V='' F='No Formula'/>
<Cell N='Label' V='Pin'/>
<Cell N='Format' V='' F='No Formula'/>
<Cell N='SortKey' V='' F='No Formula'/>
<Cell N='Type' V='0'/>
<Cell N='Invisible' V='0' F='No Formula'/>
<Cell N='Verify' V='0' F='No Formula'/>
<Cell N='DataLinked' V='0' F='No Formula'/>
<Cell N='LangID' V='sv-SE'/>
<Cell N='Calendar' V='0' F='No Formula'/>
</Row>
</Section>
I'm guessing that you're treating the vsdx as a zip and that you're missing out on the System.IO.Packaging namespace which will help you with navigating the package relationships. I'll add this link just in case:
Manipulate the Visio file format programmatically

Using LINQ how would I check if a XML row contains a field and if it does check a different field on the same row for a value

I have a scenario where an application is spitting out some XML to me and I don't have any control of what it's structure is I have posted it below.
<?xml version="1.0"?>
<DATASET xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ROW>
<OccNumber >test</OccNumber>
<OccId >Test2</OccId>
<OccTime >2017/01/26 09:38</OccTime>
<OccSummary >Test worked</OccSummary>
<DATASET>
<ROW>
<PID>123456</PID>
<CID >12345678</CID>
</ROW>
<ROW>
<PID>569867</PID>
<CID>37576334</CID>
</ROW>
<DATASET>
<ROW>
<ReportId >4345454</ReportId>
<ReportTime >2018/02/15 12:55</ReportTime>
<NumberType4 />
<accepted >Yes</accepted>
<cond1>No </Cond1>
</ROW>
</DATASET>
</ROW>
</DATASET>
So what I need to do is basically count any time I find a tag and with in the same row the tag has a value of "Yes". I would like to do this with LINQ if possible.
EDIT:
To be a bit more clear. I am assuming that because the Dataset and Row tag's repeat and don't have a unique name I can't identify a specific one to check for my count. What I do know is that the ACCEPTED and COND1 tags are unique to the data-set and row I am interested in counting. I also need to check the values in both fields as it's a combination of what I find that tells me if I should count the row or not.
Your XML isn't valid (missing closing tag on one of the DATASET elements, different case in the cond1 element). Below is some corrected XML with additional nodes that can be used for testing and some code beneath it showing how you can select the nodes you're interested in.
<?xml version="1.0"?>
<DATASET xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ROW>
<OccNumber >test</OccNumber>
<OccId >Test2</OccId>
<OccTime >2017/01/26 09:38</OccTime>
<OccSummary >Test worked</OccSummary>
<DATASET>
<ROW>
<ReportId >4345454</ReportId>
<ReportTime >2018/02/15 12:55</ReportTime>
<NumberType4 />
<cond1>No</cond1>
</ROW>
</DATASET>
<DATASET>
<ROW>
<ReportId >4345454</ReportId>
<ReportTime >2018/02/15 12:55</ReportTime>
<NumberType4 />
<accepted>Yes</accepted>
<cond1>No</cond1>
</ROW>
</DATASET>
<DATASET>
<ROW>
<PID>123456</PID>
<CID >12345678</CID>
</ROW>
<ROW>
<PID>569867</PID>
<CID>37576334</CID>
</ROW>
</DATASET>
<DATASET>
<ROW>
<ReportId >4345454</ReportId>
<ReportTime >2018/02/15 12:55</ReportTime>
<NumberType4 />
<accepted>Yes</accepted>
<cond1>No</cond1>
</ROW>
</DATASET>
</ROW>
</DATASET>
Code:
var xdoc = new XmlDocument();
xdoc.LoadXml("YOUR XML STRING HERE");
var nodes = xdoc.SelectNodes("//DATASET/ROW[accepted='Yes' and cond1='No']");

Want to find a specific node in xml

Currently, I have the xml as the following:
<Node_Parent>
<Column name="ColA" value="A" />
<Column name="ColB" value="B" />
<Column name="ColC" value="C" />
</Node_Parent>
How to get value B at ColB? I tried to use XmlDocument.SelectSingleNode("Node_Parent"), but I cannot access to ColB?
If I change to <ColB value="B" />, I can use XmlDocument.SelectSingleNode("Node_Parent/ColB").Attributes["value"].Value, but the xml format doesn't look good?
Thanks.
You need to write an XPath query in the SelectSingleNode:
var value = doc.SelectSingleNode(
"Node_Parent/Column[#name = 'ColB']"
).Attributes["value"].Value;
For more info on the XPath query language, see http://www.w3schools.com/xpath.
Good luck!

LINQ for XML, having trouble reading in multiple elements of varying occurences

Let's say I have the following XML file below. My question is, how to I account for a different number of name elements (child of environment element) when querying using LINQ. I can read the file and even query when there are the same number of name elements (for example, they all have 3). My goal is to populate an object that has a list caled environment with the names in the XML file. Any help would be appreciated.
<database type="prod">
<name>DB1</name>
<server>
<name>prodserver.net</name>
</server>
<connection>
<name>u1</name>
<password>b1</password>
</connection>
<environment>
<name>test1</name>
<name>test2</name>
<name>test3</name>
</environment>
</database>
<database type="dev">
<name>DB2</name>
<server>
<name>devserver.net</name>
</server>
<connection>
<name>u11</name>
<password>b11</password>
</connection>
<environment>
<name>test1</name>
<name>test2</name>
<name>test3</name>
<name>test4</name>
<name>test5</name>
</environment>
</database>
Or maybe to make it even easier, let's say I have the following
<student name="A" class="1">
<classes>
Math
</classes>
</student>
<student name="B" class="2">
<classes>
Programming
</classes>
</student>
And I run the following code:
var students = doc.Root
.Elements("student")
.Select(x => new Student
{
Name = (string)x.Attribute("name"),
Class = (string)x.Attribute("class"),
Type = (string)x.Elements("classes").Single().Value
})
.ToList();
It works fine, but when I add one more classes element, it breaks:
<student name="A" class="1">
<classes>
Math
</classes>
<classes>
Java
</classes>
</student>
<student name="B" class="2">
<classes>
Programming
</classes>
</student>
It's not really clear what you mean... assuming you have (say) an Environment type with a constructor accepting a List<string> as the names, you'd use:
// If element is the <environment> element
Environment = new Environment(element.Elements("name")
.Select(x => x.Value)
.ToList());

Xslt 1.0 XML to excel transformation issue

I'm using the .Net framework's XslCompiledTransform class as the XSLT processor (in other words Xslt 1.0).
My requirement is that I want to convert an XML file to an Excel file (xls file) using XSLT 1.0 and .Net [4.0].
For simplicity and since this is just test code, I'm just considering some simple hard-coding on my Xsl file.
Specifically, my two questions are:
I've tried a bunch of things, yet I can't see my Excel sheet's worksheet named to "wAbc" which is what I'm trying to name it. I see the worksheet name as my file name. Additional worksheets are not generated as well. Below is my XSL file
Additionally, since this is test code, I can't see the data outputted on different rows unless I use HTML tags . What I see is: AbcNext line instead of
Abc
Nextline
So what am I doing wrong? Thank you in advance for your time and help.
I set the content type for my file response as
application/vnd.ms-excel
Here is my test Xsl file:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<xsl:template match="/">
<xsl:processing-instruction name="mso-application">progid='Excel.Sheet'</xsl:processing-instruction>
<Workbook>
<Worksheet ss:Name="wAbc">
<Table ss:ExpandedColumnCount="2" x:FullColumns="1" x:FullRows="1" ss:DefaultRowHeight="15">
<Column ss:AutoFitWidth="20" ss:AutoWidth="65"></Column>
<Row>
<Cell>
<Data ss:Type="String">Abc</Data>
</Cell>
</Row>
<Row>
<Cell>
<Data ss:Type="String">Next line</Data>
</Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<PageSetup>
<Header x:Margin="0.3"/>
<Footer x:Margin="0.3"/>
<PageMargins x:Bottom="0.75" x:Left="0.7" x:Right="0.7" x:Top="0.75"/>
</PageSetup>
</WorksheetOptions>
</Worksheet>
</Workbook>
</xsl:template>
</xsl:stylesheet>
You are using Excel XML as the format here (so, not strictly an XLS file, which is binary).
But in any case, I can see two issues with the Excel XML you are currently outputting. Firstly, I think the ss:AutoFitWidth attribute on the Column element needs to be set to 1 (or 0). I don't think 20 is a valid value.
See http://msdn.microsoft.com/en-us/library/aa140066(v=office.10).aspx for the list of possible values.
Addtionally, I think you may (although I am not 100% sure on this) need to set the ss:ExpandedRowCount on the Table element as well as the ss:ExpandedColumnCount.
See if changing one or both of these fixes your errors.
Excel XLS files use a Microsoft proprietary binary format - no way to generate that with an XSLT.
The (relatively) new XSLX format is a set of XML files that are ZIPped together, so in principle you can generate these files with an XSLT, and then ZIP them to generate the final file - still not very easy. To see these files simply un-ZIP an XSLX file.
I suggest you generate a simpler format - either a CSV or an intermediate XML- and then import it into Excel.

Categories