Looping throught XML element to add data - c#

I dont know how exactly to word my question, so apologies from up front. I have an xml file and it has elements like the following:
- <Allow_BenGrade>
<Amount BenListID="0">0</Amount>
</Allow_BenGrade>
- <Add_Earnings_NonTaxable>
<Amount AddEarnID="0">0</Amount>
</Add_Earnings_NonTaxable>
I am interested in Allow_BenGrade where i can add multiple elements inside there. I have list of 3 items but when I loop through to write it to the file, it only writes the last item in the list, so instead of have 3 elements inside Allow_BenGrade, i end up having one (last one in the item list). My code is below. Please help thank you.
var query = from nm in xelement.Elements("EmployeeFinance")
select new Allowance {
a_empersonalID = (int)nm.Element("EmpPersonal_Id"),
a_allbengradeID = (int)nm.Element("Grade_Id")
};
var x = query.ToList();
foreach (var xEle in x)
{
var qryBenListGrade = from ee in context.Employee_Employ
join abg in context.All_Inc_Ben_Grade
on ee.Grade_Id equals abg.GradeID
join abl in context.All_Inc_Ben_Listing
on abg.All_Inc_Ben_ListingID equals abl.ID
where ee.Employee_Personal_InfoEmp_id == xEle.a_empersonalID && abg.GradeID == xEle.a_allbengradeID && (abl.Part_of_basic == "N" && abl.Status == "A" && abl.Type_of_earnings == 2)
//abl.Approved_on !=null &&
select new
{
abl.ID,
abl.Amount,
abg.GradeID,
ee.Employee_Personal_InfoEmp_id,
abl.Per_Non_Taxable,
abl.Per_Taxable
};
var y = qryBenListGrade.ToList();
//xEle.a_Amount = 0;
foreach (var tt in y)
{
Debug.WriteLine("amount: " + tt.Amount + " emp id: " + tt.Employee_Personal_InfoEmp_id + " ben list id: " + tt.ID);
// xEle.a_Amount = xEle.a_Amount + tt.Amount;
var result = from element in doc.Descendants("EmployeeFinance")
where int.Parse(element.Element("EmpPersonal_Id").Value) == tt.Employee_Personal_InfoEmp_id
select element;
foreach (var ele in result)
{
ele.Element("Allow_BenGrade").SetElementValue("Amount", tt.Amount);
//ele.Element("Allow_BenGrade").Element("Amount").SetAttributeValue("BenListID", tt.ID);
}
}
doc.Save(GlobalClass.GlobalUrl);
}

SetElementValue will, as the name suggests, set the value of the Amount element... You need to Add a new one instead:
ele.Element("Allow_BenGrade").Add(new XElement("Amount",
new XAttribute("BenListID", tt.ID),
tt.Amount);
Let me know if that solves it for you.

The XElement.SetElementValue Method:
Sets the value of a child element, adds a child element, or removes a
child element.
Also:
The value is assigned to the first child element with the specified
name. If no child element with the specified name exists, a new child
element is added. If the value is null, the first child element with
the specified name, if any, is deleted.
This method does not add child nodes or attributes to the specified
child element.
You should use the XElement.Add Method instead.

Related

trying to identify text nodes with htmlagility pack

I am trying to identify text nodes from an HTML text having a format like as below
sample text 1 : <strong>[Hot Water][Steam][Electric]</strong> Preheating Coil
sample text 2 : <b><span>[Steam] [Natural Gas Fired] [Electric] [Steam to steam]</span></b><span> Humidifier</span><br>
using the below code
public static string IdentifyHTMLTagsAndRemove(string htmlText)
{
_ = htmlText ?? throw new ArgumentNullException(nameof(htmlText));
var document = new HtmlDocument();
document.LoadHtml(htmlText);
var rootNode = document.DocumentNode;
// get first and last text nodes
var nonEmptyTextNodes = rootNode.SelectNodes("//text()[not(self::text())]") ?? new HtmlNodeCollection(null);
//if (nonEmptyTextNodes.Count == 0)
//{
// return rootNode.OuterHtml;
//}
if (nonEmptyTextNodes.Count > 0)
{
var firstTextNode = nonEmptyTextNodes[0];
var lastTextNode = nonEmptyTextNodes[^1];
// get all br nodes in html string,
var breakNodes = rootNode.SelectNodes("//br") ?? new HtmlNodeCollection(null);
var lastTextNodeLengthIndex = lastTextNode.OuterStartIndex + lastTextNode.OuterLength;
foreach (var breakNode in breakNodes)
{
if (breakNode == null)
continue;
// check index of br nodes against first and last text nodes
// and remove br nodes that sit outside text nodes
if (breakNode.OuterStartIndex <= firstTextNode.OuterStartIndex
|| breakNode.OuterStartIndex >= lastTextNodeLengthIndex)
{
breakNode.Remove();
}
}
}
return rootNode.OuterHtml;
}
But it is constantly failing here
var nonEmptyTextNodes =
rootNode.SelectNodes("//text()[not(self::text())]") ?? new
HtmlNodeCollection(null);
and nonEmptyTextNodes giving count as zero, I am unsure where I am doing wrong with the above code.
Could anyone please point me in the right direction? Many thanks in advance.
In addition to Siebe's answer, I'd also like to point out an inefficiency in the code that trims start/end BR tags. If you look at the HtmlAgilityPack code for HtmlNode operations, you'll see that whenever nodes are removed, the SetChanged() method is called on the parent (and its parent, all the way up). The next time you check the start/end indexes of anything in the tree, they need to be recalculated. So this code could be made to run much faster if you instead just create a temporary list of all the nodes to be removed, then remove them after they've all been identified.
var lastTextNodeLengthIndex = lastTextNode.OuterStartIndex + lastTextNode.OuterLength;
var breakNodesToRemove = rootNode.SelectNodes("//br")?.Where(node => node.OuterStartIndex <= firstTextNode.OuterStartIndex || node.OuterStartIndex >= lastTextNodeLengthIndex).ToList();
breakNodesToRemove?.ForEach(a => a.Remove());
reference: https://github.com/zzzprojects/html-agility-pack/blob/master/src/HtmlAgilityPack.Shared/HtmlNode.cs
Not sure what you are trying to achieve with
//text()[not(self::text())]
It tries to select text()-nodes that are not text()-nodes. So nothing will be found. If you just use
//text()
Will select all text()-nodes

Loop to iterate up parent nodes until it finds specific tag

What I'm doing is finding a specific value within an XML document and then I want to iterate upwards through each parent until it finds the parent with a specific tag name.
List<XElement> fieldReferences = new List<XElement>();
fieldReferences.AddRange(element.XPathSelectElements(string.Format("descendant::nameValue[#idref='{0}']", fieldName)));
fieldReferences.AddRange(element.XPathSelectElements(string.Format("descendant::nameValue[#value='{0}']", fieldName)));
string parentIterator = ".Parent";
string attributeValue = ".Attribute('id').Value";
string parentElementName = ".Name";
foreach (var value in fieldReferences)
{
var parentField = string.Format("{0}{1}", parentIterator, parentElementName);
while (value + parentField != "private" || value + parentField != "public")
{
// keep appending .Parent until it finds what it needs
}
//var parentField = value.Parent.Parent.Parent.Parent.Attribute("id").Value;
outputFields.Add(parentField, name.FirstOrDefault());
}
The issue that I'm having is that parentField will always be evaluated as a string so it'll never actually check the .Parent.Name property of value.
I don't work often with C# so I'm sure there's a much easier way to do this so my question is: How can I get my parentField string to evaluate the way I want OR how can I do this in a different way to achieve the same end result?
EDIT: Here's what my xml looks like. The XPAthSelectElement gets the nameValue element and I want to iterate through each parent element until I find the private tag
<private id="I need to iterate upwards through each parent element until I find this one">
<value>
<request>
<nameValues>
<nameValue idref="I found this" />
<nameValue value=""/>
</nameValues>
</request>
</value>
</private>
So you don't actually need to do this many string operations to then go crazy with XPath. Once you found your child target element, you can just use the Parent property on the XElement iteratively until you find the XElement with a private/public tag. So that gives us this code:
foreach (XElement element in fieldReferences)
{
XElement currentElement = element;
while (currentElement.Parent != null)
{
currentElement = currentElement.Parent;
if (currentElement.Name == "private" || currentElement.Name == "public") {
outputFields.Add(currentElement, /* not sure what you want here */);
break;
}
}
}
So currentElement would start out as the element with the nameValue tag from your example. In the while loop, each iteration currentElement changes to its parent node until there is no more parent or currentElement has become a private or a public tag. If the latter is the case, it gets appended to your result.
You can use the XElement.Ancestors function to get a list of all the elements that contain the nodes you found, then just select the ones you want using LINQ. No need for any loops.
var results = fieldReferences.Select(element => element.Ancestors()
.Where(ancestor => ancestor.Name == "public" ||
ancestor.Name == "private")
.FirstOrDefault());
Note that this will go all the way up the tree, and may have issues if there are multiple matching ancestors (or no matching ancestor). Let me know if that is a problem for you, and what result you want in that case, and I can make adjustments.

How to select all the tags "a" in the current child node?

In HtmlAgilityPach, when I'm selecting one node like this:
var node1 = htmlDoc.GetElementbyId("some_id");
I want to get all child "a" tags in its children. However, this doesn't work because it returns null:
foreach (var childItem in node1.ChildNodes) {
var a = childItem.SelectNodes("a") // null
var a = childItem.SelectNodes("/a") // null
var a = childItem.SelectNodes("//a") // not null but select all the "a" tags on the whole(!) page, not only the ones within current childItem
}
As you can see, the last methods selects all the "a" tags on the whole(!) page, not only the ones within current childItem. I wonder why and how to make it select the ones only in "childNode"?
You simply need to add a dot (.) at the beginning of the XPath to make it relative to current childItem :
var a = childItem.SelectNodes(".//a");

Extracting table using Htmlagilitypack + LINQ + Lambda

I'm having some difficulties using a lambda expression to parse an html table.
var cells = htmlDoc.DocumentNode
.SelectNodes("//table[#class='data stats']/tbody/tr")
.Select(node => new { playerRank = node.InnerText.Trim()})
.ToList();
foreach (var cell in cells)
{
Console.WriteLine("Rank: " + cell.playerRank);
Console.WriteLine();
}
I'd like to continue to use the syntax as
.Select(node => new { playerRank = node.InnerText.Trim()
but for the other categories of the table such as player name, team, position etc. I'm using Xpath, so I am unsure if its correct.
I'm having an issue finding out how to extract the link + player name from:
Steven Stamkos
The Xpath for it is:
//*[#id="fullPage"]/div[3]/table/tbody/tr[1]/td[2]/a
Can anyone help out?
EDIT* added HTML page.
http://www.nhl.com/ice/playerstats.htm?navid=nav-sts-indiv#
This should get you started:
var result = (from row in doc.DocumentNode.SelectNodes("//table[#class='data stats']/tbody/tr")
select new
{
PlayerName = row.ChildNodes[1].InnerText.Trim(),
Team = row.ChildNodes[2].InnerText.Trim(),
Position = row.ChildNodes[3].InnerText.Trim()
}).ToList();
The ChildNodes property contains all the cells per row. The index with determine which cell you get.
To get the url from the anchor tag contained in the player name cell:
var result = (from row in doc.DocumentNode.SelectNodes("//table[#class='data stats']/tbody/tr")
select new
{
PlayerName = row.ChildNodes[1].InnerText.Trim(),
PlayerUrl = row.ChildNodes[1].ChildNodes[0].Attributes["href"].Value,
Team = row.ChildNodes[2].InnerText.Trim(),
Position = row.ChildNodes[3].InnerText.Trim()
}).ToList();
The Attributes collection is a list of the attributes in an HTML element. We are simply grabbing the value of href.

preventing duplicates when inserting nodes to treeview control

I want to create a hierarchical view of strings based on first two characters.
If the strings are:
AAAA,AAAA,BBDD,AABB,AACC,BBDD,BBEE
I want to reate a treeview that looks like this:
AA
AAAA
AABB
AACC
BB
BBDD
BBEE
I currently have some code that looks like this (inside a loop over the strings):
TreeNode pfxNode;
if (treeView1.Nodes[pfx]!=null) {
pfxNode = treeView1.Nodes[pfx];
}
else {
pfxNode = treeView1.Nodes.Add(pfx);
}
if (!pfxNode.Nodes.ContainsKey(string)) {
pfxNode.Nodes.Add(string, string + " some info");
}
For some reason this ends up with multiple "AA" nodes at the top level.
What am I missing?
please no pre-filtering of strings I want to be able to check if a specific treenode exists based on its key.
thanks
else {
pfxNode = treeView1.Nodes.Add(pfx);
}
There's your mistake, you are forgetting to set the key of the tree node. So the next ContainsKey() won't find it. Fix:
pfxNode = treeView1.Nodes.Add(pfx, pfx);
Use this:
var q = from s in arr
group s by s.Substring(0, 2) into g
select new
{
Parent = g.Key,
Children = g.Select (x => x).Distinct()
};
foreach (var item in q)
{
var p = new TreeNode(item.Parent);
TreeView1.Nodes.Add(p);
foreach (var item2 in item.Children)
p.Nodes.Add(new TreeNode(item2));
}

Categories