converting flat txt file to nested xml

converting flat txt file to nested xml - c#

I need to convert text file with some conversion columns-elements to an nested xml file -
for example
txt flat file - Name,Age,Street name,street number,city name
conversion table-
flat file - Name,Age,Street name,street number,city name
conversion table -
Name-FullName
Age-Age
Street Name-AddressDetail-StreetName
Street Number-AddressDetail-StreetNumber
City Name-AddressDetail-CityName
xml file
<?xml version="1.0" encoding="UTF-8"?>
<Envelop>
<FullName>Steve Mate</FullName>
<Age>22</Age>
<AddressDetail>
<StreetName>Rockford</StreetName>
<StreetNumber>111</StreetNumber>
<CityName>Alimena</CityName>
</AddressDetail>
</Envelop>
What is the best way to implement it?
Can it be used with MVC?
Using XElement?

The following code should work for you
var lines = text.Split(new[] {"\r", "\n", "\r\n"}, 0);
var xDoc = new XDocument(new XElement("Envelop"));
foreach (var line in lines)
{
var values = line.Split('-');
var parentNode = xDoc.Root;
foreach (var value in values.Reverse().Skip(1))
{
var name = value.Replace(" ", "");
var node = parentNode.Element(name);
if (node == null)
{
node = new XElement(name);
parentNode.Add(node);
}
parentNode = node;
}
parentNode.Value = values.Last();
}
Split the text based on newlines.
For each line, split it based on -
Take the root the starting node
For each value in the split line, skipping the first item...
Remove spaces (you can't have spaces in a node name).
Get the child node with that name
If it doesn't exist, create it.
Set the current parent as this node.
Finally set the value of this node to the last item in the split values
dotnetfiddle
Output:
<Envelop>
<FullName>FullName</FullName>
<Age>Age</Age>
<AddressDetail>
<StreetName>StreetName</StreetName>
<StreetNumber>StreetNumber</StreetNumber>
<CityName>CityName</CityName>
</AddressDetail>
</Envelop>

I've done similar a number of times.
There is no special trick to this. You will need to parse the input file to get the elements you need, and then you'll need to create an XML file with the data.
I don't know of any existing way to convert your particular text file format.

Related

Need to select Data from XML file C#

What I'm trying to do is get data from my XML file which has been merged with two others and selected each venue from that file and try to add the value to a list so I can manipulate it further.
This is one of my XML files
<?xml version="1.0" encoding="utf-8" ?>
<Funrun>
<Venue name="Roker Park">
<Runner charity="Cancer Research">
<Firstname>Roger</Firstname>
<Lastname>Malibu</Lastname>
<Sponsorship>550</Sponsorship>
</Runner>
<Runner charity="Arthritis UK">
<Firstname>Adam</Firstname>
<Lastname>White</Lastname>
<Sponsorship>340</Sponsorship>
</Runner>
</Venue>
</Funrun >
I need to be able to select the venue name and save it to a list. This is what I've got so far:
List<string> VenueNames = new List<string>();
var doc = XDocument.Load("XMLFile1.xml");
var doc2 = XDocument.Load("XMLFile2.xml");
var doc3 = XDocument.Load("XMLFile3.xml");
var combinedUnique = doc.Descendants("Venue")
.Union(doc2.Descendants("Venue"))
.Union(doc3.Descendants("Venue"));
foreach (var venuename in combinedUnique.Elements("Venue"))
{
VenueNames.Add(venuename.Attribute("name").Value));
}

The easiest way I would do it is by including Name and Charity within the XElements they belong to.
What I would recommend you do is first reformat your document so that it looks like this:
<Funrun>
<Venue>
<Name>Roker Park</Name>
<Runner1>
<charity>Cancer Research</charity>
<Firstname>Roger</Firstname>
<Lastname>Malibu</Lastname>
<Sponsorship>550</Sponsorship>
</Runner1>
<Runner2>
<charity>Arthritis UK</charity>
<Firstname>Adam</Firstname>
<Lastname>White</Lastname>
<Sponsorship>340</Sponsorship>
</Runner2>
</Venue>
</Funrun >
Note that you could get even simpler by combining all the elements under "Funrun" (example: "Venue") and just iterate through all of them without having to switch documents.
Next moving over to C#:
List<string> VenueNames = new List<string>();
var doc = XDocument.Load("XMLFile1.xml");
var doc2 = XDocument.Load("XMLFile2.xml");
var doc3 = XDocument.Load("XMLFile3.xml");
foreach (XElement element in doc.Root.Descendants("Venue"))
{
VenueNames.Add(element.Element("Name").Value.ToString());
}
//Copy paste this code for each document you would like to search through, though of course change "doc" to say, "doc2".
So just real quick, what this code will do is it will first open the Root element in the XDocument. It will find Decendants of that element with the name, "Name", and for each of those it will copy its value as a string into your list.

List<string> xmlFilePaths = new List<string>
{
#"Path\\SomeJson.txt",
#"Path\\SomeJson1.txt"
};
var venues = xmlFilePaths.Select(fp => XDocument.Load(fp).Descendants("Venue")?.FirstOrDefault()?.Attribute("name")?.Value).Distinct().ToList();

How to get the xml node value into a string

How to get the xml node value in a string.
i am getting This error
Data at the root level is invalid. Line 1, position 1.
error shown in this line
xmldoc.LoadXml(xmlFile);
my xml
<?xml version="1.0" encoding="utf-8" ?>
<UOM>
<!-- The selected currency used will be stored here for Code reference" -->
<ActiveCurrencyType>
<ActiveCurrency>U.S.Dollar</ActiveCurrency>
<ActiveCode>USD</ActiveCode>
<ActiveSymbol>$</ActiveSymbol>
</ActiveCurrencyType>
<!-- The selected Dimension used will be stored here for Code reference -->
<ActiveDimension>
<ActiveDimensionUOM>Inches</ActiveDimensionUOM>
<ActiveDimensionSymbol>.in</ActiveDimensionSymbol>
</ActiveDimension>
<!-- The selected weight used will be stored here for Code reference -->
<ActiveWeight>
<ActiveWeightUOM>Pounds</ActiveWeightUOM>
<ActiveWeightSymbol>lb</ActiveWeightSymbol>
</ActiveWeight>
</UOM>
C# code
string xmlFile = Server.MapPath("~/HCConfig/HCUOM.xml");
XmlDocument xmldoc = new XmlDocument();
xmldoc.LoadXml(xmlFile);
XmlNodeList nodeList = xmldoc.GetElementsByTagName("ActiveDimensionSymbol");
string ActiveDimensionSymbol = string.Empty;
foreach (XmlNode node in nodeList)
{
ActiveDimensionSymbol = node.InnerText;
}
How can I achieve this?

You're using the wrong overload, LoadXml doesn't do what you think it does.
Use xmldoc.Load(xmFile); because that method takes an file path as input. LoadXml expects an string with xml in it.
The exception is an indicator of that mistake. What is processed is not XML, and a filepath isn't that.
After this changes the string ActiveDimensionSymbol contains .in if I run this locally.
If you want to use LoadXml you should first read your whole file in a string, for example like so:
xmldoc.LoadXml(File.ReadAllText(xmlFile));
but is really only overhead to call File.ReadAllText if there is an method that accepts a file.

You can use the Descendants() method to get all XElements by certain name, found in the System.Xml.Linq namespace.
XDocument doc = XDocument.Load("XMLFile1.xml");
string[] allActiveWeightUOMs = doc.Descendants("ActiveWeightUOM").Select(o => o.Value).ToArray();
// allActiveWeightUOMs : "Pounds" ...

As can seen here link the method that you are using to load the XML excepts xml by string not xml file. You can use XmlDocument.Load instead of XmlDocument.LoadXml

Try this code its works just fine with this xml
string xmlFile = Server.MapPath("~/HCConfig/HCUOM.xml");
XDocument doc = XDocument.Load(xmlFile );
var nodeList = doc.Descendants("ActiveDimensionSymbol");
string ActiveDimensionSymbol = string.Empty;
foreach (var node in nodeList)
{
ActiveDimensionSymbol = node.Value;
}

Fetch values from an xml file and write into a excel sheet row-wise

I have a situation here, I have to write a simple WPF apps with a textbox and a button, textbox is for "en", "fr", "ru", etc...
I have a lot of files with huge number of data like this. Consider this is a file
<?xml version="1.0" encoding="UTF-8"?>
<params>
<btestsir xml:lang="fr" tId="HHXAF">Test Sirène :</btestsir>
<btestsir xml:lang="en" tId="HHXAF">Test Siren :</btestsir>
<btestsir xml:lang="pt" tId="HHXAF">Testar sirene:</btestsir>
<btestsir xml:lang="ru" tId="HHXAF">Тест сирены:</btestsir>
<btestbeep xml:lang="fr" tId="HHXA2">Test Bip :</btestbeep>
<btestbeep xml:lang="en" tId="HHXA2">Test Beep :</btestbeep>
<btestbeep xml:lang="pt" tId="HHXA2">Testar aviso sonoro:</btestbeep>
<btestbeep xml:lang="ru" tId="HHXA2">Тест гудка:</btestbeep>
</params>
Now if I select "en" through my application, and click on the button then only two strings whose xml:lang="en" values match will be copied to an excel sheet. and would display something like this.
For English language
For Russian language
I have tried this
StreamReader reader = new StreamReader(strFilepath);
string line;
while(null != (line = reader.ReadLine()))
{
string[] s1 = line.Split('>');
string[] s2 = s1[1].Split('<');
if(s1[0].Contains("xml:lang="))
{
}
}
Logic behind this is simple first I want to split every line using ">"
So s1[0] will have <btestsir xml:lang="fr" transId="HHXAF"
s1[1] will have Test Sirène :</btestsir>
But here the problem I am facing is how to fetch the "key" for a specific language xml:lang= and put that key-value pair to excel sheet, As I have shown in the picture. Fetching the "value" is easy s2[0] will have the value.
But this two key-value pair should be match and then put into the excel sheet, then it will again continue for the next line.
Edit: One point here the key value pair should be put into different excel-sheet for different language files. excel-sheet.en.xlsx will contain all "en", excel-sheet.fr.xlsx will contain all "fr", etc
As I said I have huge files and it should work seamlessly without manual intervention.
Can you help me please!
Thanks

I would use xpath selectors and not try string manipulation of an xml file. So you could do something like declare a nodelist variable and populate it ie:
using System.xml;
...
XmlNodeList childNodes;
childNodes = xml.SelectNodes("ParentNodeofbtestsirNode/btestsirnode[#lang='en']);
...
and go on from there doing your manipulations against each XmlNode such as
foreach(XmlNode xnd in childNodes)
{
'...
}

Something like this should do it:
XmlDocument doc = new XmlDocument();
doc.Load("sample.xml");
var root = doc.DocumentElement;
var elements = root.XPathSelectElements("//[#xml:lang='en']");
foreach (var child in elements)
{
//child.Name >> will give you btestsir
//child.Value >> will give you Test Siren :
}
And to create/manage spreadsheets use EPPlus see sample here

Poorly defined XML, get node and contents of all child nodes as string concat with spaces?

Here's some fantastic example XML:
<root>
<section>Here is some text<mightbe>a tag</mightbe>might <not attribute="be" />. Things are just<label>a mess</label>but I have to parse it because that's what needs to be done and I can't <font stupid="true">control</font> the source. <p>Why are there p tags here?</p>Who knows, but there may or may not be spaces around them so that's awesome. The point here is, there's node soup inside the section node and no definition for the document.</section>
</root>
I'd like to just grab the text from the section node and all sub nodes as strings. BUT, note that there may or may not be spaces around the sub-nodes, so I want to pad the sub notes and append a space.
Here's a more precise example of what input might look like, and what I'd like output to be:
<root>
<sample>A good story is the<book>Hitchhikers Guide to the Galaxy</book>. It was published<date>a long time ago</date>. I usually read at<time>9pm</time>.</sample>
</root>
I'd like the output to be:
A good story is the Hitchhikers Guide to the Galaxy. It was published a long time ago. I usually read at 9pm.
Note that the child nodes don't have spaces around them, so I need to pad them otherwise the words run together.
I was attempting to use this sample code:
XDocument doc = XDocument.Parse(xml);
foreach(var node in doc.Root.Elements("section"))
{
output += String.Join(" ", node.Nodes().Select(x => x.ToString()).ToArray()) + " ";
}
But the output includes the child tags, and is not going to work out.
Any suggestions here?
TL;DR: Was given node soup xml and want to stringify it with padding around child nodes.

Incase you have nested tags to an unknown level (e.g <date>a <i>long</i> time ago</date>), you might also want to recurse so that the formatting is applied consistently throughout. For example..
private static string Parse(XElement root)
{
return root
.Nodes()
.Select(a => a.NodeType == XmlNodeType.Text ? ((XText)a).Value : Parse((XElement)a))
.Aggregate((a, b) => String.Concat(a.Trim(), b.StartsWith(".") ? String.Empty : " ", b.Trim()));
}

You could try using xpath to extract what you need
var docNav = new XPathDocument(xml);
// Create a navigator to query with XPath.
var nav = docNav.CreateNavigator();
// Find the text of every element under the root node
var expression = "/root//*/text()";
// Execute the XPath expression
var resultString = nav.evaluate(expression);
// Do some stuff with resultString
....
References:
Querying XML, XPath syntax

Here is a possible solution following your initial code:
private string extractSectionContents(XElement section)
{
string output = "";
foreach(var node in section.Nodes())
{
if(node.NodeType == System.Xml.XmlNodeType.Text)
{
output += string.Format("{0}", node);
}
else if(node.NodeType == System.Xml.XmlNodeType.Element)
{
output += string.Format(" {0} ", ((XElement)node).Value);
}
}
return output;
}
A problem with your logic is that periods will be preceded by a space when placed right after an element.

You are looking at "mixed content" nodes. There is nothing particularly special about them - just get all child nodes (text nodes are nodes too) and join they values with space.
Something like
var result = String.Join("",
root.Nodes().Select(x => x is XText ? ((XText)x).Value : ((XElement)x).Value));

How to replace all "values" in an XML document with "0.0" using C# (preferably LINQ)?

This is not a homework; I need this for my unit tests.
Sample input: <rows><row><a>1234</a><b>Hello</b>...</row><row>...</rows>.
Sample output: <rows><row><a>0.0</a><b>0.0</b>...</row><row>...</rows>.
You may assume that the document starts with <rows> and that parent node has children named <row>. You do not know the name of nodes a, b, etc.
For extra credit: how to make this work with an arbitrary well-formed, "free-form" XML?
I have tried this with a regex :) without luck. I could make it "non-greedy on the right", but not on the left. Thanks for your help.
EDIT: Here is what I tried:
private static string ReplaceValuesWithZeroes(string gridXml)
{
Assert.IsTrue(gridXml.StartsWith("<row>"), "Xml representation must start with '<row>'.");
Assert.IsTrue(gridXml.EndsWith("</row>"), "Xml representation must end with '<row>'.");
gridXml = "<deleteme>" + gridXml.Trim() + "</deleteme>"; // Fake parent.
var xmlDoc = XDocument.Parse(gridXml);
var descendants = xmlDoc.Root.Descendants("row");
int rowCount = descendants.Count();
for (int rowNumber = 0; rowNumber < rowCount; rowNumber++)
{
var row = descendants.ElementAt(0);
Assert.AreEqual<string>(row.Value /* Does not work */, String.Empty, "There should be nothing between <row> and </row>!");
Assert.AreEqual<string>(row.Name.ToString(), "row");
var rowChildren = row.Descendants();
foreach (var child in rowChildren)
{
child.Value = "0.0"; // Does not work.
}
}
// Not the most efficient but still fast enough.
return xmlDoc.ToString().Replace("<deleteme>", String.Empty).Replace("</deleteme>", String.Empty);
}

XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
foreach (XmlElement el in doc.SelectNodes("//*[not(*)]"))
el.InnerText = "0.0";
xml = doc.OuterXml;
or to be more selective about non-empty text nodes:
foreach (XmlText el in doc.SelectNodes("//text()[.!='']"))
el.InnerText = "0.0";

XDocument xml = XDocument.Load(myXmlFile);
foreach (var element in xml.Descendants("row").SelectMany(r => r.Elements()))
{
element.Value = "0.0";
}
Note that this general search for "Desscendants('row')" is not very efficient--but it satisfies the 'arbitrary format' requirement.

You should take look at HTML Agility Pack. It allows you to treat html documents as well-formed xml's, therefore you can parse it and change values.

I think you can use Regex.Replace method in C#. I used the below regex to replace all the XML elements values:
[>]+[a-zA-Z0-9]+[<]+
This will basically match text starting with a '>'{some text alphabets or number}'<'.
I was able to use this successfully in Notepad++. You can write a small program as well using this.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

converting flat txt file to nested xml - c#

I've done similar a number of times. There is no special trick to this. You will need to parse the input file to get the elements you need, and then you'll need to create an XML file with the data. I don't know of any existing way to convert your particular text file format.

Related

Need to select Data from XML file C#

How to get the xml node value into a string

Fetch values from an xml file and write into a excel sheet row-wise

Poorly defined XML, get node and contents of all child nodes as string concat with spaces?

How to replace all "values" in an XML document with "0.0" using C# (preferably LINQ)?

Categories

Resources