Nested XML to CSV, generic code

Nested XML to CSV, generic code - c#

This is a tough one.
I have to deal with some XML files we've got from our suppliers. The files contain the products and the variants (color, size, and such). Some of the suppliers send us plain XMLs, containing one root node per variant, having the product name repeated on every row, other suppliers send us nested XMLs, containing the product as root, variants as children nodes.
I have searched the site before asking this question, all the answers I found start with the assumption the XML structure is known.
In my case, it's not. Tomorrow I can get files from a new supplier, with a different structure/different column names.
My goal is to convert the nested xml into a plain xml (i.e. having only root nodes, and repeat the product name for all the new created rows).
Is this even possible? Any idea?

Related

How to insert the new element on XMLNode in C#

I am trying to insert a new element into the XmlNode in C# code,
How to insert the <delimiter>##<delimiter> element in inside the "/TestBooks/template/field" root. (Screenshot1)
enter image description here
Inside the <field> element, i need to insert the <delimiter> element, based on id element <Id>11-09-2020-505</Id>. (Screenshot2)
enter image description here

First of all you really shouldn't use pictures in questions -- we are taking the time to type you an answer you can take the time to use copy and paste and format your questions.
I'd like to answer your questions but I'm concerned about the questions because it implies that you are adding in markers to aid in the parsing of the data. You really shouldn't need to parse XML.
There are many great parsers of XML including one built into C# you should not roll your own.
In the XML standard the order of the children not defined. If you put a child as the "first" child, there is no reason to expect that a parser would list it as first.
(To have order in children you should just add an order attribute)
Because of these reason it should not matter where in the list of children you add the child.
So we can tell you how to add a child but we can't "put it in a specific spot" since children don't have an order.
In summary, it is not possible to do what you ask.

How does DataSet.ReadXml(TextReader) handle nested XML elements?

I'm a PHP programmer, and I'm trying to understand some code which I think is ASP.NET. This is also my first foray into XML. I don't have access to a Windows box to test on.
I need to produce XML output that third-party code can use. The third party wants to use our data instead of the data source they are currently using. I don't want to replicate the current XML structure exactly because it doesn't map well to our data.
The structure of the current XML is very flat. There are only a few nested elements and the third party doesn't make use of any of them. The third party does have a sub-contracted programmer, but he is very busy. Also, I want to understand, for myself, how this works.
This is an excerpt from a plugin for a custom CMS:
Dim obj_set As New Data.DataSet()
Using obj_reader As New System.Xml.XmlTextReader("http://www.example.com/xml_output.php")
obj_set.ReadXml(obj_reader)
End Using
Dim obj_view As Data.DataView = obj_set.Tables("profile").DefaultView
obj_view.Sort = "cname"
Dim obj_data As Data.DataTable = obj_view.ToTable()
So from what I have gathered so far, this code
reads the XML file into a DataSet
sorts the profile table by cname
creates a new DataTable from the sorted view
There is other code that stores the new table to, and retrieves it from, cache. Then there is code that loops through the table rows and maps the column names to template variables.
Sample excerpt of current XML structure:
<profiles>
<profile>
<cname>ABC Corporation</cname>
<fname>John</fname>
<lname>Smith</lname>
<sector>Widgets</sector>
<subsectors>
<subsector>Basic Widgets</subsector>
<subsector>Fancy Widgets</subsector>
</subsectors>
</profile>
</profiles>
So what happens to the subsectors data? Does the reader create a separate table for it? If so, how are the tables related?
Our data includes multiple contacts per company. I could just create multiple elements at the top level fname1, fname2, fname3 to keep the flat structure. But I was thinking a nested structure makes sense for this kind of data. The problem is that I don't understand if such a structural change is compatible with the plugin code.
What kinds of changes would need to be made to the plugin code to make use of nested elements?

I was stumped on this myself, and I don't know if you still are, but for reference to others here's what I found.
You are right in assuming that the reader creates a separate table for it. Being that a DataSet can hold multiple tables, each "level" of elements gets its own table. However, any nested elements that have nested elements of their own will get their own table. Essentially, it keeps creating tables until it reaches the bottom of the xml tree. If an element has no children, it gets added as a cell in the data table.
In your case,
dataSet.Tables[0] will hold the top level nodes (all the <.profiles>). But since the nested element <.profile> has elements of its own, Tables[0] will likely only have one row. The next level deeper, dataSet.Tables[1] will hold all <-profile> nodes. Although since <.subsectors> has sub-element <.subsector>, it will not be in Tables[1], but rather in Tables[2] which goes yet level deeper.
I know it has been a while since this was asked, but hopefully this will be helpful.

receiving everyday XML files - 12 types need to do search on these everyday

Asp.NET - C#.NET
I need a advice regarding a design problem below:
I'll receive everyday XML files. It changes the quantity e.g. yesterday 10 XML files received, today XML 56 files received and maybe tomorrow 161 XML files etc.
There are 12 types (12 XSD)... and in the top there is a attribute called FormType e.g. FormType="1", FormType="2" , FormType="12" etc. up to 12 formtypes.
All of them have common fields like Name, adres, Phone.
But e.g. FormType=1 is for Construction, FormType=2 is for IT, FormType 3=Hospital, Formtype=4 is for Advertisement etc. etc.
As I said all of them have common attributes.
Requirements:
Need a search screen so the user can do search on these XML contents. But I don't have any clue how to approach this. e.g. Search the text in some attributes for the xml's received from Date_From and Date_To.
Problem:
I've heard about putting the XML's in a Binary field and do XPATH query or whatever but don't know the word's to search on google.
I was thinking to create a big database.table and read all XML's and put in the Database Table. But the issue is some xml attributes are very huge like 2-3 pages. and the same attributes in other XML file are empty..
So creating NVARCHAR(MAX) for every XML attribute and putting them in table.field.... After some period my DATABASE will be a big big monster...
Can someone advice what is the best approach to handle this issue?

I'm not 100% sure I understand your problem. I'm guessing that the query's supposed to return individual XML documents that meet some kind of user-specified criteria.
In that event, my starting point would probably be to implement a method for querying a single XML document, i.e. one that returns true if the document's a hit and false otherwise. In all likelihood, I'd make the query parameter an XPath query, but who knows? Here's a simple example:
public bool TestXml(XDocument d, string query)
{
return d.XPathSelectElements(query).Any();
}
Next, I need a store of XML documents to query. Where does that store live, and what form does it take? At a certain level, those are implementation details that my application doesn't care about. They could live in a database, or the file system. They could be cached in memory. I'd start by keeping it simple, something like:
public IEnumerable<XDocument> XmlDocuments()
{
DirectoryInfo di = new DirectoryInfo(XmlDirectoryPath);
foreach (FileInfo fi in di.GetFiles())
{
yield return XDocument.Load(fi.Filename);
}
}
Now I can get all of the documents that fulfill a request like this:
public IEnumerable<XDocument> GetDocuments(query)
{
return XmlDocuments.Where(x => TextXml(x, query));
}
The thing that jumps out at me when I look at this problem: I have to parse my documents into XDocument objects to query them. That's going to happen whether they live in a database or the file system. (If I stick them in a database and write a stored procedure that does XPath queries, as someone suggested, I'm still parsing all of the XML every time I execute a query; I've just moved all that work to the database server.)
That's a lot of I/O and CPU time that gets spent doing the exact same thing over and over again. If the volume of queries is anything other than tiny, I'd consider building a List<XDocument> the first time GetDocuments() is called and come up with a scheme of keeping that list in memory until new XML documents are received (or possibly updating it when new XML documents are received).

Map multiple xml feeds to one object

Using C# I want to be able to map a number of xml feeds to one custom object. Each xml feed has the same kind of data but has its own naming convention.
Ideally i would like to store for each xml feed its own mapping and apply that automatically when copying the xml data to my object. I would like to do this as the system may grow to hundreds of feeds so just being able to store the mappings would make it easier to maintain than writing code for each feed.
So for example, my object consists of
ID, Name
And xml feed one is
Code, ProductName
xml feed two is
UniqueID, FullName
so the mappings would be
ID -> Code
Name -> ProductName
and
ID -> UniqueID
Name -> FullName
What would be the best way of achieving this?

I would create a configsection in your config file. You could then have a node for each feed. Then have nodes within that have the mapping information. The nodes in your feed node would match the properties in your c# object and the node value would be the node name in your xml file. You could also even add the full xpath path if it was more complicated.
<feed url="">
<id>Code</id>
<Name>ProductName</Name>
</feed>
Then in your app you could load the feed. Then search for the node in your config file to get how to map the fields to your C# object from fields in your xml file.
Just one approach that would make it easy to configure and grow without changing the application unless your c# object changes.

Handling duplicate nodes in XML

Scenario:
I am parsing values from an XML file using C# and have the following method:
private static string GetXMLNodeValue(XmlNode basenode, string strNodePath)
{
if (basenode.SelectSingleNode(strNodePath) != null)
return (basenode.SelectSingleNode(strNodePath).InnerText);
else
return String.Empty;
}
To get a particular value from the XML file, I generally pass the root node and a path like "parentnode/item"
I recently ran into an issue where two nodes at the same document level share the same name.
Why:
The duplicate nodes all need to be read and utilized; they have child nodes with differing values (which can distinguish them). At present I just have two nodes, each named <Process> and sub-nodes named <Name> which contain unique names. Other sub-nodes contain various values such as memory usage. When processing and storing the values for the sub-nodes, I would essentially ignore the parent node name and use my own names based on the sub-node <Name> value.
Question:
What is the best way to get the values for duplicate-named nodes distinctly? My thought was to load all values matching the node name into an array and then using the array index to refer to them. I'm not sure how to implement that, though. (I'm not well-versed in XML navigation.)
Sample XML
<?xml version="1.0" ?>
<info>
<processes>
<process>
<name>program1</name>
<memory_use>24.8</memory_use>
</process>
<process>
<name>program2</name>
<memory_use>19.0</memory_use>
</process>
</processes>
</info>

Use SelectNodes method instead it gives you a list of all nodes matching your Xpath

The answer to your question is, "it depends".
It depends on what you intend to do with the "duplicate" nodes. What does it mean for there to be more than one with the same name? Was it an error in the program that generated the XML? Was it correct, and an unlimited number of such nodes is permitted? What do they mean when there are more than one?
You need to answer those questions first, before designing code that processes "duplicate" nodes.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Nested XML to CSV, generic code - c#

Related

How to insert the new element on XMLNode in C#

How does DataSet.ReadXml(TextReader) handle nested XML elements?

receiving everyday XML files - 12 types need to do search on these everyday

Map multiple xml feeds to one object

Handling duplicate nodes in XML

Categories

Resources