Reading XML Data and Storing in DataTable - c#

I have an log file like this..
This is the segment 1
============================
<MAINELEMENT><ELEMENT1>10-10-2013 10:10:22.444</ELEMENT1><ELEMENT2>1111</ELEMENT2>
<ELEMENT3>Message 1</ELEMENT3></MAINELEMENT>
<MAINELEMENT><ELEMENT1>10-10-2013 10:10:22.555</ELEMENT1><ELEMENT2>1111</ELEMENT2>
<ELEMENT3>Message 2</ELEMENT3></MAINELEMENT>
This is the segment 2
============================
<MAINELEMENT><ELEMENT1>10-11-2012 10:10:22.444</ELEMENT1><ELEMENT2>2222</ELEMENT2>
<ELEMENT3>Message 1</ELEMENT3></MAINELEMENT>
<MAINELEMENT><ELEMENT1>10-11-2012 10:10:22.555</ELEMENT1><ELEMENT2>2222</ELEMENT2>
<ELEMENT3>Message 2</ELEMENT3></MAINELEMENT>
How can I read this into DataTable excluding the data This is the segment 1 and This is the segment 2 and ====== lines completely.
I would like to have the Datatable as with Columns as "ELEMENT1", "ELEMENT2", "ELEMENT3" and fill the details with the content between those tags in the order of print of line.
It should not change the sequence of the order of records in the table while inserting.

HtmlAgilityPack seems to be a good tool for what you need:
using HtmlAgilityPack;
class Program
{
static void Main(string[] args)
{
var doc = new HtmlDocument();
doc.Load("log.txt");
var dt = new DataTable();
bool hasColumns = false;
foreach (HtmlNode row in doc
.DocumentNode
.SelectNodes("//mainelement"))
{
if (!hasColumns)
{
hasColumns = true;
foreach (var column in row.ChildNodes
.Where(node => node.GetType() == typeof(HtmlNode)))
{
dt.Columns.Add(column.Name);
}
}
dt.Rows.Add(row.ChildNodes
.Where(node => node.GetType() == typeof(HtmlNode))
.Select(node => node.InnerText).ToArray());
}
}
}

could do this, where stringData is the data from the file you have
var array = stringData.Split(new[] { "============================" }, StringSplitOptions.RemoveEmptyEntries);
var document = new XDocument(new XElement("Root"));
foreach (var item in array)
{
if(!item.Contains("<"))
continue;
var subDocument = XDocument.Parse("<Root>" + item.Substring(0, item.LastIndexOf('>') + 1) + "</Root>");
foreach (var element in subDocument.Root.Descendants("MAINELEMENT"))
{
document.Root.Add(element);
}
}
var table = new DataTable();
table.Columns.Add("ELEMENT1");
table.Columns.Add("ELEMENT2");
table.Columns.Add("ELEMENT3");
var rows =
document.Descendants("MAINELEMENT").Select(el =>
{
var row = table.NewRow();
row["ELEMENT1"] = el.Element("ELEMENT1").Value;
row["ELEMENT2"] = el.Element("ELEMENT2").Value;
row["ELEMENT3"] = el.Element("ELEMENT3").Value;
return row;
});
foreach (var row in rows)
{
table.Rows.Add(row);
}
foreach (DataRow dataRow in table.Rows)
{
Console.WriteLine("{0},{1},{2}", dataRow["ELEMENT1"], dataRow["ELEMENT2"], dataRow["ELEMENT3"]);
}

I'm not so sure where you problem is.
You can use XElement for reading the xml and manually creating DataTable.
For Reading the XML See Xml Parsing using XElement
Then you can create dynamically the datatable.
Heres an example of creating a datatable in code
https://sites.google.com/site/bhargavaclub/datatablec
But why do you want to use a DataTable ? There are a lot of downsides...

Related

How to skipped if the column name are same in the datatable

I'm using HTML Agility Pack to web scrape to datatable. However the website have multiple same column name which it was not able to add on for the second table.
The error will be prompt out like this as the "2020" had been added before
My code as below :
public void WebDataScrap()
{
try
{
//Get the content of the URL from the Web
const string url = "https://www.wsj.com/market-data/quotes/MY/XKLS/0146/financials/annual/cash-flow";
var web = new HtmlWeb();
var doc = web.Load(url);
const string classValue = "cr_dataTable"; //cr_datatable
//var nodes = doc.DocumentNode.SelectNodes($"//table[#class='{classValue}']") ?? Enumerable.Empty<HtmlNode>();
var resultDataset = new DataSet();
foreach (HtmlNode table in doc.DocumentNode.SelectNodes($"//table[#class='{classValue}']") ?? Enumerable.Empty<HtmlNode>())
{
var resultTable = new DataTable(table.Id);
foreach (HtmlNode row in table.SelectNodes("//tr"))
{
var headerCells = row.SelectNodes("th");
if (headerCells != null)
{
foreach (HtmlNode cell in headerCells)
{
resultTable.Columns.Add(cell.InnerText);
}
}
var dataCells = row.SelectNodes("td");
if (dataCells != null)
{
var dataRow = resultTable.NewRow();
for (int i = 0; i < dataCells.Count; i++)
{
dataRow[i] = dataCells[i].InnerText;
}
resultTable.Rows.Add(dataRow);
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
The URL i trying to web scrape : https://www.wsj.com/market-data/quotes/MY/XKLS/0146/financials/annual/cash-flow
I did try to do looping to skip if it was having the same name but it will prompt that the column unable to find when I try to debug.
Is there any solution that can help to solve this? In the end I will need to export the datatable to csv/excel file.
Thanks
I think you want to do this instead:
foreach (HtmlNode table in doc.DocumentNode.SelectNodes($"//table[#class='{classValue}']") ?? Enumerable.Empty<HtmlNode>())
{
var resultTable = new DataTable(table.Id);
// select all the headers and add them to the table
var headerCells = table.SelectNodes("thead/tr/th");
if (headerCells != null)
{
foreach (HtmlNode cell in headerCells)
{
resultTable.Columns.Add(cell.InnerText);
}
}
// select all the rows and add them to the table
foreach (HtmlNode row in table.SelectNodes("tbody/tr"))
{
var dataCells = row.SelectNodes("td");
if (dataCells != null)
{
var dataRow = resultTable.NewRow();
for (int i = 0; i < dataCells.Count; i++)
{
dataRow[i] = dataCells[i].InnerText;
}
resultTable.Rows.Add(dataRow);
}
}
}
The header section and the data section each have their own loop rather than the header section being nested in the data loop. We're also being more explicit about where we want data from: the header should come from thead/tr/th and the data should come from tbody/tr.

C# Reading CSV file with SQL conditions

I am using CsvHelper lib to read CSV file and I can successfully read the file with the lib. However I cannot use SQL condition to filter values. How can I do that without using SQL Server. I am really stuck on it.
It was very easy with Pandas and Pandasql libs in Python but it is being too hard in C#..
My Code:
public static void Main(string[] args)
{
var fileInfo = new FileInfo(#"filePath");
using (TextReader reader = fileInfo.OpenText())
using (var csvReader = new CsvReader(reader))
{
csvReader.Configuration.Delimiter = ",";
csvReader.Configuration.HasHeaderRecord = false;
csvReader.Configuration.IgnoreQuotes = true;
csvReader.Configuration.TrimFields = true;
csvReader.Configuration.WillThrowOnMissingField = false;
while (csvReader.Read())
{
var myStrinVar = csvReader.GetField<string>(0);
Console.Write(myStrinVar); //SELECT * FROM table...
}
}
}
I would suggest using LINQ to filter your results.
https://msdn.microsoft.com/en-us/library/bb397906.aspx
Say you have some class MyClass that you can serialize the lines in your file into.
For example:
public class MyClass
{
public int ID { get; set; }
}
var records = csv.GetRecords<MyClass>().ToList();
var filtered = records.Where(r => r.ID >= 10);
That example is a bit contrived but you can use any boolean expression you like in the where clause.
I know this is too late for OP, but the issue with the accepted answer is that you have to read in the entire result set to memory which may not be tenable for large files. Also, if you can extend this code below to get the top N rows without having to read the entire CSV if you find matches early in the file.
public static void Main(string[] args)
{
var fileInfo = new FileInfo(#"filePath");
var where = ""; //Code to set up where clause part of query goes here
using (TextReader reader = fileInfo.OpenText())
using (var csvReader = new CsvReader(reader))
{
csvReader.Configuration.Delimiter = ",";
csvReader.Configuration.HasHeaderRecord = false;
csvReader.Configuration.IgnoreQuotes = true;
csvReader.Configuration.TrimFields = true;
csvReader.Configuration.WillThrowOnMissingField = false;
DataTable dt = null;
while (csvReader.Read())
{
//Use the first row to initialize the columns.
if (dt == null)
{
dt = new DataTable();
for (var i = 0; i < csvReader.FieldCount; i++)
{
var fieldType = csvReader.GetFieldType(i);
DataColumn dc;
if (fieldType.IsNullableType())
{
dc = new DataColumn(csvReader.GetName(i), Nullable.GetUnderlyingType(fieldType));
dc.AllowDBNull = true;
}
else
dc = new DataColumn(csvReader.GetName(i), data.GetFieldType(i));
dt.Columns.Add(dc);
}
}
//Map DataReader to DataRow
var newRow = dt.Rows.Add();
foreach(DataColumn col in dt.Columns)
{
newRow[col.ColumnName] = csvReader[col.ColumnName];
}
//Create a temporary DataView and filter it with the where clause.
DataView dv = new DataView(dt);
dv.RowFilter = where;
var data = dv.Count > 0 ? dv[0] : null;
if(data != null)
{
//Row in here matches your where clause.
//Code to read this row or do something with it.
}
//Empty the temporary data table.
dt.Rows.Clear();
}
}
}

Convert IEnumerable string array to datatable

I have a csv file delimited with pipe(|). I am reading it using the following line of code:
IEnumerable<string[]> lineFields = File.ReadAllLines(FilePath).Select(line => line.Split('|'));
Now, I need to bind this to a GridView. So I am creating a dynamic DataTable as follows:
DataTable dt = new DataTable();
int i = 0;
foreach (string[] order in lineFields)
{
if (i == 0)
{
foreach (string column in order)
{
DataColumn _Column = new DataColumn();
_Column.ColumnName = column;
dt.Columns.Add(_Column);
i++;
//Response.Write(column);
//Response.Write("\t");
}
}
else
{
int j = 0;
DataRow row = dt.NewRow();
foreach (string value in order)
{
row[j] = value;
j++;
//Response.Write(column);
//Response.Write("\t");
}
dt.Rows.Add(row);
}
//Response.Write("\n");
}
This works fine. But I want to know if there is a better way to convert IEnumerable<string[]> to a DataTable. I need to read many CSVs like this, so I think the above code might have performance issues.
Starting from .Net 4:
use ReadLines.
DataTable FileToDataTable(string FilePath)
{
var dt = new DataTable();
IEnumerable<string[]> lineFields = File.ReadLines(FilePath).Select(line => line.Split('|'));
dt.Columns.AddRange(lineFields.First().Select(i => new DataColumn(i)).ToArray());
foreach (var order in lineFields.Skip(1))
dt.Rows.Add(order);
return dt;
}
(edit: instead this code, use the code of #Jodrell answer, This prevents double charging of the Enumerator).
Before .Net 4:
use streaming:
DataTable FileToDataTable1(string FilePath)
{
var dt = new DataTable();
using (var st = new StreamReader(FilePath))
{
// first line procces
if (st.Peek() >= 0)
{
var order = st.ReadLine().Split('|');
dt.Columns.AddRange(order.Select(i => new DataColumn(i)).ToArray());
}
while (st.Peek() >= 0)
dt.Rows.Add(st.ReadLine().Split('|'));
}
return dt;
}
since, in your linked example, the file has a header row.
const char Delimiter = '|';
var dt = new DataTable;
using (var m = File.ReadLines(filePath).GetEnumerator())
{
m.MoveNext();
foreach (var name in m.Current.Split(Delimiter))
{
dt.Columns.Add(name);
}
while (m.MoveNext())
{
dt.Rows.Add(m.Current.Split(Delimiter));
}
}
This reads the file in one pass.

C# treeview getting duplicate nodes

By the beginning of this week Iwas having a problem with TreeView not displaying children. Everything got worked out through recursiveness. However, a new and unexpected problem arose: the methods i'm using are getting duplicate nodes on some specific DataTables.
Having this DataTable of two columns:
ParentOT ChildOT
20120601 20120602
20120601 20120603
20120601 20120604
20120601 20120611
20120601 20120612
20120602 20120605
20120602 20120606
20120602 20120607
20120602 20120608
20120602 20120610
20120603 20120607
20120603 20120608
20120603 20120609
If I try to display its Treeview I get the right treeview, but five times consecutively (the times the parent appears as parent in parentOT records).
The Methods are these:
private TreeView cargarOtPadres(TreeView trv, int otPadre, DataTable datos)
{
if (datos.Rows.Count > 0)
{
foreach (DataRow dr in datos.Select("OTPadre="+ otPadre))
{
TreeNode nodoPadre = new TreeNode();
nodoPadre.Text = dr["OTPadre"].ToString();
trv.Nodes.Add(nodoPadre);
cargarSubOts(ref nodoPadre, int.Parse(dr["OTPadre"].ToString()), datos);
}
}
return trv;
}
private void cargarSubOts(ref TreeNode nodoPadre, int otPadre, DataTable datos)
{
DataRow[] otHijas = datos.Select("OTPadre=" + otPadre);
foreach (DataRow drow in otHijas)
{
TreeNode hija = new TreeNode();
hija.Text = drow["OTHija"].ToString();
nodoPadre.Nodes.Add(hija);
cargarSubOts(ref hija, int.Parse(drow["OTHija"].ToString()), datos);
}
}
With Tables with just 1 great parent appearing 1 time only, it works great. How can i prevent the TreeView from duplicating??
I'll leave the answer for the sake of completion. This solution came courtesy of #King King
public static class TreeViewExtension
{
public static void LoadFromDataTable(this TreeView tv, DataTable dt)
{
var parentNodes = dt.AsEnumerable()
.GroupBy(row => (string)row[0])
.ToDictionary(g => g.Key, value => value.Select(x => (string)x[1]));
Stack<KeyValuePair<TreeNode, IEnumerable<string>>> lookIn = new Stack<KeyValuePair<TreeNode, IEnumerable<string>>>();
HashSet<string> removedKeys = new HashSet<string>();
foreach (var node in parentNodes)
{
if (removedKeys.Contains(node.Key)) continue;
TreeNode tNode = new TreeNode(node.Key);
lookIn.Push(new KeyValuePair<TreeNode, IEnumerable<string>>(tNode, node.Value));
while (lookIn.Count > 0)
{
var nodes = lookIn.Pop();
foreach (var n in nodes.Value)
{
IEnumerable<string> children;
TreeNode childNode = new TreeNode(n);
nodes.Key.Nodes.Add(childNode);
if (parentNodes.TryGetValue(n, out children))
{
lookIn.Push(new KeyValuePair<TreeNode, IEnumerable<string>>(childNode, children));
removedKeys.Add(n);
}
}
}
tv.Nodes.Add(tNode);
}
}
}
You create this class
And you use afterwards like this.
treeView1.LoadFromDataTable(DataTable);
Be sure to use it with a String type DataTable. If you have a int type Table, you can do something like this:
DataTable stringDataTable = intDataTable.Clone();
stringDataTable.Columns[0].DataType = typeof(string);
stringDataTable.Columns[1].DataType = typeof(string);
foreach (DataRow dr in intDataTable.Rows)
{
stringDataTable.ImportRow(dr);
}
treeView1.LoadFromDataTable(stringDataTable);

Read multiple xml tables (under the same Root node) into DataTables/DataSet

I have an XML source document with multiple "report" nodes under the Root node. I need to read each "report" node into its own DataTable. It looks like I'll either need to transform my source XML data using an xsl stylesheet to get it in the format that'll work nicely or iterate through my xml elements like so:
namespace XmlParse2
{
class Program
{
static IEnumerable<string> expectedFields = new List<string>() { "Field1", "Field2", "Field3", "Field4" };
static void Main(string[] args)
{
string xml = #"<Root>
<Report1>
<Row>
<Field1>data1-1</Field1>
<Field2>data1-2</Field2>
<Field4>data1-4</Field4>
</Row>
<Row>
<Field1>data2-1</Field1>
<Field2>data2-2</Field2>
</Row>
</Report1>
<Report2>
<Row>
<Field1>data1-1</Field1>
<Field4>data1-4</Field4>
</Row>
<Row>
<Field1>data2-1</Field1>
<Field3>data2-3</Field3>
</Row>
</Report2>
</Root>";
DataTable report1 = new DataTable("Report1");
report1.Columns.Add("Field1");
report1.Columns.Add("Field2");
report1.Columns.Add("Field3");
report1.Columns.Add("Field4");
DataTable report2 = new DataTable("Report2");
report2.Columns.Add("Field1");
report2.Columns.Add("Field2");
report2.Columns.Add("Field3");
report2.Columns.Add("Field4");
var doc = XDocument.Parse(xml);
var report1Data = doc.Root.Elements("Report1").Elements("Row").Select(record => MapRecord(record));
var report2Data = doc.Root.Elements("Report2").Elements("Row").Select(record => MapRecord(record));
report1 = addRows(report1, report1Data);
report2 = addRows(report2, report2Data);
Console.ReadLine();
}
public static Dictionary<string, string> MapRecord(XElement element)
{
var output = new Dictionary<string, string>();
foreach (var field in expectedFields)
{
bool hasField = element.Elements(field).Any();
if (hasField)
{
output.Add(field, element.Elements(field).First().Value);
}
}
return output;
}
public static DataTable addRows(DataTable table, IEnumerable<Dictionary<string, string>> data)
{
foreach (Dictionary<string, string> dict in data)
{
DataRow row = table.NewRow();
foreach(var item in dict)
{
row[item.Key] = item.Value;
}
table.Rows.Add(row);
}
return table;
}
}
}
The problem with my source data not working seems to be that both Report1 and Report2 have child nodes that are named "Row" and my attempts to do stuff using DataSet.ReadXml is not successful because my code just groups all nodes named Row into one DataTable instead of separate DataTables. :/
What am I missing?
XDocument xdoc = XDocument.Load(path_to_xml);
var tables = xdoc.Root.Elements()
.Select(report => {
DataTable table = new DataTable(report.Name.LocalName);
var fields = report
.Descendants("Row")
.SelectMany(row => row.Elements()
.Select(e => e.Name.LocalName))
.Distinct();
foreach(string field in fields)
table.Columns.Add(field);
foreach(var row in report.Descendants("Row"))
{
DataRow dr = table.NewRow();
foreach(var field in row.Elements())
dr[field.Name.LocalName] = (string)field;
table.Rows.Add(dr);
}
return table;
});
This query will return IEnumerable<DataTable>. Each datatable will contain only those columns, which have values in xml. Column names retrieved from xml and could be different for each table. For your sample structure will look this way:
DataTable: Report1
Columns: Field1, Field2, Field4
DataTable: Report2
Columns: Field1, Field3, Field4
All rows data will be added to each table.
You can extract some code to methods. It will make code easier to understand:
XDocument xdoc = XDocument.Load(path_to_xml);
var tables = xdoc.Root.Elements()
.Select(report => CreateTableFrom(report));
And methods:
private static DataTable CreateTableFrom(XElement report)
{
DataTable table = new DataTable(report.Name.LocalName);
table.Columns.AddRange(GetColumnsOf(report));
foreach (var row in report.Descendants("Row"))
{
DataRow dr = table.NewRow();
foreach (var field in row.Elements())
dr[field.Name.LocalName] = (string)field;
table.Rows.Add(dr);
}
return table;
}
private static DataColumn[] GetColumnsOf(XElement report)
{
return report.Descendants("Row")
.SelectMany(row => row.Elements().Select(e => e.Name.LocalName))
.Distinct()
.Select(field => new DataColumn(field))
.ToArray();
}

Categories