How to parse text from anonymous block in AngleSharp? - c#

I'm parsing site content using AngleSharp and i've got an issue with anonymous block.
See the sample code:
var parser = new HtmlParser();
var document = parser.Parse(#"<body>
<div class='product'>
<a href='#'><img src='img1.jpg' alt=''></a>
Hello, world
<div class='comments-likes'>1</div>
</div>
<div class='product'>
<a href='#'><img src='img2.jpg' alt=''></a>
Yet another helloworld
<div class='comments-likes'>25</div>
</div>
<body>");
var products = document.QuerySelectorAll("div.product");
foreach (var product in products)
{
var productTitle = product.Text();
productTitle.Dump();
}
So, productTitle contains numbers from div.comments-likes, output is:
Hello, world 1
Yet another helloworld 25
I've tried something like product.FirstElementChild.NextElementSibling.Text(); but next sibling for link element is div.comments-likes, not anonymous block. It shows:
1
25
So, anonymous blocks are skipped. :(
The best workaround i've found is deleting all preventing blocks, for my example:
product.QuerySelector(".comments-likes").Remove();
var productTitle = product.Text().Trim();
Is better way for parsing text from anonymous block?

Text is modeled as a TextNode, it is a type of node beside element, comment node, processing instruction, etc. That's why NextElementSibling you tried didn't include the text in the result since it intended to return elements only, as the name suggests.
You can get text nodes located directly within product div by traversing through the div's ChildNodes and then filter by NodeType, for example :
var products = document.QuerySelectorAll("div.product");
foreach (var product in products)
{
var productTitle = product.ChildNodes
.First(o => o.NodeType == AngleSharp.Dom.NodeType.Text
&& o.TextContent.Trim() != "");
Console.WriteLine(productTitle.TextContent.Trim());
}
dotnetfiddle demo
Notice that newlines between elements are also text nodes, so we need to filter those out in the demo above.

Related

2sxc Get list of fields of an entity

Let's say I have an entity "Cars" with two fields "Brand" and "Model".
In a c# template, is it possible to dynamically get the name of the fields inside "Cars" to a list? The output would meed to be {"Brand", "Model"}.
Even further, is it possible to get the description and a specific translation of the field name and description?
Using Daniel's comments and circling around to just answer your original question, here it is simplified and things are split up a little to see the parts:
#inherits ToSic.Sxc.Dnn.RazorComponent
#using Newtonsoft.Json
#{
var myData = AsList(Data);
var myDatum = AsEntity(myData.First());
var myFieldNames = (myDatum.Type.Attributes as IEnumerable<dynamic>).Select(a => a.Name);
}
<pre>
myDatum.Type.Name = #myDatum.Type.Name
myFieldNames = #JsonConvert.SerializeObject(myFieldNames)
</pre>
Which then outputs just:
myDatum.Type.Name = Cars
myFieldNames = ["Name","Brand","Model"]
I think what you are trying to do is covered in the tutorials pretty well.
https://2sxc.org/dnn-tutorials/en/razor
In particular take a look at the LINQ examples; numbers 6, 7, and 8.
I #Joao
Basically you should check Jeremys answer for most of your question.
I believe you're also asking about showing the labels like Brand in the Razor using the field-label from the ContentType specs. This is possible, but it's a bit harder as it's not a common use case. So let me just point you in the right direction...
Each entity has a property called Type. In Razor you would get this using
var someType = AsEntity(yourThing).Type;
This is an IContentType https://docs.2sxc.org/api/dot-net/ToSic.Eav.Data.IContentType.html.
To get the properties and the names of them you would go to
var attr = someType.Attributes["TheName"];
which gives you an IContentTypeAttribute https://docs.2sxc.org/api/dot-net/ToSic.Eav.Data.IContentTypeAttribute.html
This has Metadata - so
var attr = someType.Attributes["TheName"].Metadata;
The metadata is an IMetadataOf https://docs.2sxc.org/api/dot-net/ToSic.Eav.Metadata.IMetadataOf.html
So using this you can find everything you want - but as you can see it's quite a hoop to jump through.
Here is a simple working example. I am sorta hoping Daniel chimes in and reveals an easy way to go from myType.Attributes and convert straight to a Json string??
Create a new View, Enable List, point it to your Cars Content-Type, fix the last few lines so that the "And the data..." part matches your CT's actual fields.
#inherits ToSic.Sxc.Dnn.RazorComponent
#{
var myData = AsList(Data);
var myType = AsEntity(myData.First()).Type;
var myFields = new List<string>();
foreach(var field in myType.Attributes) {
myFields.Add(field.Name);
}
}
<div #Edit.TagToolbar(Content)>
<h3>View Heading</h3>
<h4>Table (Content Type) Name: #myType.Name</h4>
<p>has the following fields</p>
<div class="d-flex flex-row bd-highlight mb-3">
#foreach(var field in myType.Attributes) {
<p class="p-2 bd-highlight"><strong>#field.Name</strong></p>
}
</div>
<h4>As a comma separated list?</h4>
<p>#string.Format("{{\"{0}\"}}", string.Join("\",\"", myFields))</p>
<h4>And the data...</h4>
#foreach(var cont in AsList(Data)) {
<div class="d-flex flex-row bd-highlight mb-3"
#Edit.TagToolbar(cont)>
<div class="p-2 bd-highlight">#cont.EntityId</div>
<div class="p-2 bd-highlight">#cont.Name</div>
<div class="p-2 bd-highlight">#cont.Brand</div>
<div class="p-2 bd-highlight">#cont.Model</div>
</div>
}
</div>
The Cars Content type with only 3 fields
And the output of the View looks like this
For an update on what I needed to achieve, this outputs the field name and the label for an easy loop:
var myData = AsList(App.Data["Stages"]);
var myDatum = AsEntity(myData.First());
var myFields = (myDatum.Type.Attributes as IEnumerable<dynamic>);
// var myFieldNames = myFields.Select(a => a.Name);
// var myFieldLabels = myFields.Select(a => (a.Metadata as IEnumerable<dynamic>).First().Title.TypedContents);
var myFieldNamesAndLabels = myFields.Select(i => new
{
i.Name,
(i.Metadata as IEnumerable<dynamic>).First().GetBestTitle()
});
If there is an easier way to achieve this, please let me know.
Thanks #Jeremy Farrance and #iJungleBoy

Verify Order Of HTML Elements With Attribute Values Such as Class="Group0-Item1" Class="Group0-Item2" Class="Group1-Item1"

In my Selenium/C#/NUNIT project, I need to find a way to validate the order (top down hierarchy of a page's HTML) for a group of HTML elements (as well as the elements contained within those groups). These are my elements that show inside my page's HTML...
<div class="gapBanner-banner-order1-group0"></div>
<div class="gapBanner-banner-order1-group1"></div>
<div class="gapBanner-banner-order1-group2"></div>
<div class="gapBanner-banner-order2-group2"></div>
The validation I want to perform should be able to catch the following bugs:
Bug 1: The groups are not in order within the page's HTML. One of the elements that is in group1 appears first in the HTML before group0...
<div class="gapBanner-banner-order1-group1"></div>
<div class="gapBanner-banner-order1-group0"></div>
<div class="gapBanner-banner-order1-group2"></div>
<div class="gapBanner-banner-order2-group2"></div>
Bug #2: The elements WITHIN each group are not in order within the page's HTML. Group2-Order2 appears before Group2-Order1 within the HTML
<div class="gapBanner-banner-order1-group0"></div>
<div class="gapBanner-banner-order1-group1"></div>
<div class="gapBanner-banner-order2-group2"></div>
<div class="gapBanner-banner-order1-group2"></div>
The below is what I have coded so far, but it is definitely not going to do the job, not to mention, it is very messy. I cant figure out what kind of logic I need for this
/// 5. Verify the correct order of elements in which they appear inside the HTML
List<IWebElement> CustomPageHTMLComponents = Browser.
FindElements(By.XPath("//div[contains(#class, 'group')")).ToList();
List<IWebElement> uniqueGroups = new List<IWebElement>();
// Get the unique groups
for (int i = 0; i < CustomPageHTMLComponents.Count; i++)
{
IWebElement currentComponent = Browser.FindElements(By.XPath("//div[contains(#class, 'group')"))[i];
string toBeSearched = "group";
string currentComponenetClassAttributeValue = currentComponent.GetAttribute("class");
int x = currentComponenetClassAttributeValue.IndexOf(toBeSearched);
string groupNumber = currentComponenetClassAttributeValue.Substring(x + toBeSearched.Length);
if (groupNumber == i.ToString())
{
uniqueGroups.Add(currentComponent);
}
}
// Some kind of logic to verify everything???
for (int i = 0; i < Page.CustomPageHTMLComponents.Count; i++)
{
IWebElement currentComponent = Browser.FindElements(By.XPath("//div[contains(#class, 'group')"))[i];
string toBeSearched = "group";
string currentComponenetClassAttributeValue = currentComponent.GetAttribute("class");
int x = currentComponenetClassAttributeValue.IndexOf(toBeSearched);
string groupNumber = currentComponenetClassAttributeValue.Substring(x + toBeSearched.Length);
Assert.AreEqual(groupNumber, i.ToString());
}
There are probably a number of ways to do this. This is the first way I came up with...
Grab all the class names from the desired elements and store them in string array #1
Make a copy of string array #1 and sort it
Compare the two arrays and if they are equal, then they were sorted to start with
I've checked the HTML you provided for the bugs you'd like to catch and it catches them all. The one issue I can think of is if you get more than 9 orders or groups the sorting will not be what you want because it's alpha order not numerical order, e.g. 1, 10, 2, ... instead of 1, 2, ... 10.
// capture the class names from the desired classes
string[] elements = _driver.FindElements(By.CssSelector("div[class^='gapBanner-banner-']")).Select(e => e.GetAttribute("class")).ToArray();
// make a copy of the array
string[] sortedElements = new string[elements.Length];
elements.CopyTo(sortedElements, 0);
// sort the copy
Array.Sort(sortedElements);
// compare the arrays for order using NUnit CollectionAssert
CollectionAssert.AreEqual(elements, sortedElements, "Verify ordering of elements");

How to use the FindElements command correctly?

Below I have three dive tags that all have the same class:
<div class="confirmation-price-summary__price-label confirmation-price-summary__price-label--with-dropdown">(2) Seats</div>
<div class="confirmation-price-summary__price-label confirmation-price-summary__price-label--with-dropdown">(2) Meals</div>
<div class="confirmation-price-summary__price-label confirmation-price-summary__price-label--with-dropdown firefinder-match">(1) Extra Baggage</div>
I create a variable to point to the all of those class elements via xpath in my 'confirmationResponsiveElements.cs' page:
public static By TravelEssentialsBasketLabels => By.XPath("//*[#class='confirmation-price-summary__price-label confirmation-price-summary__price-label--with-dropdown']");
I want to use the 'FindElements' method to find all of these elements and then assert that they contain 'Seats', 'Meals' and 'Extra Baggage'. However I am not sure how to use this correctly as it's giving me the red line of death:
public void TravelEssentialsLabelsSideBasket()
=> _driver.FindElements(ConfirmationResponsiveElements.TravelEssentialsBasketLabels).ToString();
What is the corret way to use FindElements and also, how can Assert.IsTrue should be written if I want to check that it contains 'Seats' , 'Meals' and 'Extra Baggage'?
Thanks
List<string> actualoptions = new List<string>();
List<string> expectedoptions = new List<string>();
expectedoptions.Add("Seats");
expectedoptions.Add("Meals");
expectedoptions.Add("Extra Baggage");
ReadOnlyCollection<IWebElement> links = driver.FindElements(By.XPath("//*[#class='confirmation-price-summary__price-label confirmation-price-summary__price-label--with-dropdown']"));
foreach(IWebElement link in links)
{
string text = link.Text;
actualoptions.Add(text);
}
//then you compare this list with expected value
if(String.SequenceEqual(actualoptions ,expectedoptions)){
console.write("matching");
else
console.write("not matching");

XPath, select multiple elements from multiple nodes in HTML

I just can't figure this one.
I have to search through all nodes that have classes with "item extend featured" values in it (code below). In those classes I need to select every InnerText of <h2 class="itemtitle"> and href value in it, plus all InnerTexts from <div class="title-additional">.
<li class="item extend featured">
<div class="title-box">
<h2 class="itemtitle">
<a target="_top" href="www.example.com/example1/example2/exammple4/example4" title="PC Number 1">PC Number 1</a>
</h2>
<div class="title-additional">
<div class="title-km">150 km</div>
<div class="title-year">2009</div>
<div class="title-price">250 €</div>
</div>
The output should be something like this:
Title:
href:
Title-km:
Title-year:
Title-Price:
--------------
Title:
href:
Title-km:
Title-year:
Title-Price:
--------------
So, the question is, how to traverse through all "item extend featured" nodes in html and select items I need above from each node?
As I understand, something like this should work but it breaks halfway
EDIT: I just noticed, there are ads on the site that share the exact same class and they obviously don't have the elements I need. More problems to think about.
var items1 = htmlDoc.DocumentNode.SelectNodes("//*[#class='item extend featured']");
foreach (var e in items1)
{
var test = e.SelectSingleNode(".//a[#target='_top']").InnerText;
Console.WriteLine(test);
}
var page = new HtmlDocument();
page.Load(path);
var lists = page.DocumentNode.SelectNodes("//li[#class='item extend featured']");
foreach(var list in lists)
{
var link = list.SelectSingleNode(".//*[#class='itemtitle']/a");
string title = link.GetAttributeValue("title", string.Empty);
string href = link.GetAttributeValue("href", string.Empty);
string km = list.SelectSingleNode(".//*[#class='title-km']").InnerText;
string year = list.SelectSingleNode(".//*[#class='title-year']").InnerText;
string price = list.SelectSingleNode(".//*[#class='title-price']").InnerText;
Console.WriteLine("Title: %s\r\n href: %s\r\n Title-km: %s\r\n Title-year: %s\r\n Title-Price: %s\r\n\r\n", title, href, km, year, price);
}
What you are trying to achieve requires multiple XPath expressions as you can't return multiple results at different levels using one query (unless you use Union perhaps).
What you might be looking for is something similar to this:
var listItems = htmlDoc.DocumentNode.SelectNodes("//li[#class='item extend featured']");
foreach(var li in listItems) {
var title = li.SelectNodes("//h2/a/text()");
var href = li.SelectNodes("//h2/a/#href");
var title_km = li.SelectNodes("//div[#class='title-additional']/div[#class='title-km']/text()");
var title_... // other divs
}
Note: code not tested

Why does Response.Write() only write the first element in list?

I am trying to write some syndication onto my page,
I use the .Net class to get the rss content into a list
<div>
<%
var r = System.Xml.XmlReader.Create("http://www.huffingtonpost.com/feeds/verticals/small-business/index.xml");
System.ServiceModel.Syndication.SyndicationFeed albums = System.ServiceModel.Syndication.SyndicationFeed.Load(r);
r.Close();
foreach (System.ServiceModel.Syndication.SyndicationItem album in albums.Items)
{
Response.Write(album.Title.Text);
}
%>
</div>
Well the foreach is only functioning as a forfirst here, because it only writes the first SyndicationItem in the list. As you can see, there are many items in that list. Where can be my mistake?
Just to make sure there is not only 1 item in my album list, I did a count on it.
<div>
<%
var r = System.Xml.XmlReader.Create("http://www.huffingtonpost.com/feeds/verticals/small-business/index.xml");
System.ServiceModel.Syndication.SyndicationFeed albums = System.ServiceModel.Syndication.SyndicationFeed.Load(r);
r.Close();
int i = albums.Items.ToList().Count;
Response.Write(i);
/* foreach (System.ServiceModel.Syndication.SyndicationItem album in albums.Items)
{
Response.Write(album.Title.Text);
} */
%>
</div>
Result:
I am wondering if the title output your seeing is "small business on huffingtonpost.com". If it is, then it is working correctly. You have one item in the list with many entries. Do another iteration inside your current iteration and you should be good to go.
Update
I just pasted your code in a forms page and it came through with all 15 results.

Categories