i want to try a programm which reads out Values of a website with Geckofx. Now i have the following Problem that i dont get the needed Values and it shows to me it is null.
The HTML Code i want to access:
<li id="box" class="tooltip" title="">
<div class="classname"></div>
<span class="value">
<span id="class_test" class="">48.066</span>
</span>
</li>
48.066 is the Value i want to read.
I searched now for about 2 days for a solution that i can go on with my private project i hope anyone can help me :)
Solutions i tried:
Test 1:
GeckoElement testelement = null;
testelement = (GeckoElement)Browser.Document.GetElementById("class_test");
string text = testelement.GetAttribute("value");
Test 2:
GeckoHtmlElement testelement = null;
testelement = (GeckoHtmlElement)Browser.Document.GetHtmlElementById("class_test");
string text = testelement.InnerHtml;
If testelement is null, you're not loading the page correctly, or the source is incorrect.
This works fine:
string content = "<html><body><li id=\"box\" class=\"tooltip\" title=\"\">"+
"<div class=\"classname\"></div>" +
"<span class=\"value\">" +
"<span id=\"class_test\" class=\"\">48.066</span>" +
"</span></li></body></html>";
webBrowser1.LoadHtml(content, "http://www.example.com");
And in webBrowser_DocumentCompleted:
GeckoElement testelement = null;
testelement = (GeckoElement)webBrowser1.Document.GetElementById("class_test");
string text = testelement.InnerHtml; // 48.066
Most likely, you need to wait for the document to finish loading before looking for elements.
Browser.DocumentCompleted += (sender, e) =>
{
var testElement = Browser.Document.GetElementById("class_test") as GeckoElement;
// TODO: handle testElement being null
string text = testElement.GetAttribute("value");
}
Related
I need help because I am not really used to work with HTML. I show a webdocument from my code, the web document read an HTML file, containing some Images.
Everytime, just before the Image tag, I observed two tags who create some wrong caracters. An example would be better.
<p ><br clear=all> </span>
<img border=0 width=265 height=105 id="Picture 84856"
src="Test_HTML/image272.jpg"></p>
the printing is partially correct because it shows the Images and a lots of wrong ÂÂÂÂÂÂÂÂÂ characters.
So I decided to try to cut the tags.
I don't know how to do this. Perhaps I am completely wrong but I think it is good start, isn't it?
My test to suppress these tags in a Html node is
public void ShowTag(string tag)
{
string innerHtml= "//div[#id='"+tag+ "']";
string inner = "//p";
string brToRemove = "//br";
string spanToRemove = "//span";
var nodes = document.DocumentNode.SelectSingleNode(innerHtml);
bool br_deleted = false;
foreach (HtmlNode nd in nodes.SelectNodes(inner))
{
foreach (HtmlNode child in nd.ChildNodes)
{
if (child.Name == "br")
{
int a = 0;
a++;
child.ParentNode.RemoveChild(child);
br_deleted = true;
}
if(child.Name=="span")
{
int b = 0;
b++;
if (br_deleted == true)
{
//nd.ParentNode.RemoveChild(child);
child.Remove();
br_deleted = false;
}
}
}
}
but I cannot remove the child, do you have any idea?
I founded where the problem came from: When selecting the good node, I needed to add the Headers so i could identify the encoding.
string innerHtml = "//div[#id='" + tag + "']";
string inner = "//p";
webbrowser.Navigate("about:blank");
LoadDocument();
HtmlNode nodes = document.DocumentNode.SelectSingleNode(innerHtml);
HtmlNode head = document.DocumentNode.SelectSingleNode("/html/head");
head.AppendChild(nodes);
webbrowser.NavigateToString(head.InnerHtml);
This is within Sitefinity if that matters, and I am really new at ASP.NET and C#.
I have an image-based navigation element at the bottom of a page that links to different articles using the same template. There are 5 articles, and I would like the link to the active page/article to be hidden so there is a grid of 4 image links.
Here's a screenshot:
https://i.imgur.com/PG2Sfpo.png
Here is the code behind it:
#{
string navTitle = string.Empty;
string url = string.Empty;
if (Model.CurrentSiteMapNode != null && Model.CurrentSiteMapNode.ParentNode != null)
{
if (Model.CurrentSiteMapNode.Title == "Home")
{
navTitle = Model.CurrentSiteMapNode.ParentNode.Title;
}
else
{
navTitle = Model.CurrentSiteMapNode.Title;
}
url = Model.CurrentSiteMapNode.ParentNode.Url;
}
}
<div class="foundation-stories-container">
#foreach (var node in Model.Nodes)
{
#RenderRootLevelNode(node);
}
</div>
#*Here is specified the rendering for the root level*#
#helper RenderRootLevelNode(NodeViewModel node)
{
string[] thisPage = (node.Url).Split('/');
string thisImage = thisPage[4] + ".jpg";
<a href="#node.Url" target="#node.LinkTarget">
<div class="foundation-story-block">
<div class="hovereffect">
<img src="[OUR WEBSITE URL]/stories/#thisImage" class="img-fluid">
<div class="overlay">
<h2>#node.Title</h2>
</div>
</div>
</div>
</a>
}
So we're already getting the page URL and image file name
string[] thisPage = (node.Url).Split('/');
string thisImage = thisPage[4] + ".jpg";
Is this as easy as doing the following?
if (thisImage = thisPage)
{
foundation-story-block.AddClassToHtmlControl("hide")
}
Seems easy enough, but I don't know where to start.
I'm better at Javascript, so I do have a JS solution in place for this already, but I'd really like to find a cleaner way to do it.
<script type="text/javascript">
$(document).ready(function() {
var active = window.location.pathname.split("/").pop()
var name = active;
name = name.replace(/-/g, ' ');
jQuery.expr[":"].Contains = jQuery.expr.createPseudo(function(arg) {
return function( elem ) {
return jQuery(elem).text().toUpperCase().indexOf(arg.toUpperCase()) >=
0;
};
});
$("h2:Contains('" + name + "')").closest(".foundation-story-block").addClass("hide");
});
</script>
This exists on the main template page.
Gets the last part of the URL
Sets that as a variable called "name"
Changes the dash to a space if there is one (most of the pages are associated with names so it's like /first-last)
Then it goes and looks at the which is where the title of the page lives, and if it equals the "name" variable, the ".hide" class is added to the block.
Thanks for any help anyone can provide.
You could bind a click event to your elements with the foundation-story-block class. The reason I use .on instead of .click is because when using UpdatePanels the click event won't fire after an UpdatePanel has it's update event triggered - you might encounter a similar problem with your dynamic binding so I used .on to avoid this.
$(".foundation-story-block").on("click", function() {
// Remove the "hide" class from any elements that have it applied
$.each($(".foundation-story-block.hide"), function(index, value) {
// Remove the class using the "this" context from the anonymous function
$(this).removeClass("hide");
});
// Add the "hide" class to the element that was clicked
$(this).addClass("hide");
});
I haven't run this though an IDE so it might not be 100% correct but it will put you on the correct path.
It is possible, yes. Here is how:
...
#{
var hiddenClass = thisImage == thisPage ? "hide" : string.Empty;
}
<div class="foundation-story-block #hiddenClass">
<div class="hovereffect">
<img src="[OUR WEBSITE URL]/stories/#thisImage" class="img-fluid">
<div class="overlay">
<h2>#node.Title</h2>
</div>
</div>
</div>
I'm currently crawling some web sites and retrieving information from them to store into a database for later use. I'm using HtmlAgilityPack and I've successfully done this for a few sites now but for some reason this one is giving me issues. I'm fairly new to XPath syntax still so I'm probably messing up there.
Heres what the code from the site looks like that I'm trying to retreive:
<form ... id="_subcat_ids_">
<input ....>
<ul ...>
<li ....>
<input .....>
<a class="facet-seleection multiselect-facets "
.... href="INeedThisHref#1">
Text I Need //need to retrieve this text between then <a></a>
<span class="subtle-note">(2)</span> //I Need that number from inside the span
</a>
</li>
<li ....>
<input .....>
<a class="facet-seleection multiselect-facets "
.... href="INeedThisHref#2">
Text I Need #2 //need to retrieve this text between then <a></a>
<span class="subtle-note">(6)</span> //I Need that number from inside the span
</a>
</li>
Each one of those represents an item on a page, but I'm only interested in what is happening with each <a></a>. I want to retrieve that href value from inside the <a>, then the text between the opening and closing, then I need the text inside the <span>. I left out the stuff inside of the other tags because they do not help uniquely identify each item, the class inside <a> is the only thing they share, and they are all inside of the form with id="_subcat_ids_".
Heres my code:
try
{
string fullUrl = "...";
HtmlWeb web = new HtmlWeb();
ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3 | SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12;
HtmlDocument html = web.Load(fullUrl);
foreach (HtmlNode node in html.DocumentNode.SelectNodes("//form[#id='_subcat_ids_']")) //this gets me into the form
{
foreach (HtmlNode node2 in node.SelectNodes(".//a[#class='facet-selection multiselect-facets ']")) //this should get me into the the <a> tags, but it is throwing a fit with 'object reference not set to an instance of an object'
{
//get the href
string tempHref = node2.GetAttributeValue("href", string.Empty);
//get the text between <a>
string tempCat = node2.InnerText.Trim();
//get the text between <span>
string tempNum = node2.SelectSingleNode(".//span[#class='subtle-note']").InnerText.Trim();
}
}
}
catch (Exception ex)
{
Console.Write("\nError: " + ex.ToString());
}
That first foreach loop doesn't error, but the second one gives me object reference not set to an instance of an object at the line where my second foreach loop is. Like I mentioned before, I'm still new to this syntax, I've used this type of method on another website with great success but I'm having some trouble with this site. Any tips would be appreciated.
Well I figured it out, heres the code
foreach (HtmlNode node in html.DocumentNode.SelectNodes("//form[#id='_subcat_ids_']"))
{
//get the categories, store in list
foreach (HtmlNode node2 in node.SelectNodes("..//a[#class='facet-selection multiselect-facets ']//text()[normalize-space() and not(ancestor::span)]"))
{
string tempCat = node2.InnerText.Trim();
categoryList.Add(tempCat);
Console.Write("\nCategory: " + tempCat);
}
foreach (HtmlNode node3 in node.SelectNodes("..//a[#class='facet-selection multiselect-facets ']"))
{
//get href for each category, store in list
string tempHref = node3.GetAttributeValue("href", string.Empty);
LinkCatList.Add(tempHref);
Console.Write("\nhref: " + tempHref);
//get the number of items from categories, store in list
string tempNum = node3.SelectSingleNode(".//span[#class='subtle-note']").InnerText.Trim();
string tp = tempNum.Replace("(", "");
tempNum = tp;
tp = tempNum.Replace(")", "");
tempNum = tp;
Console.Write("\nNumber of items: " + tempNum + "\n\n");
}
}
works like a charm
This code can work with one of the web, but with some sites it back error messages like this, I do not know how to edit (Error in stars)
var document = webBrowser1.Document;
var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)document.DomDocument;
var htmlString = documentAsIHtmlDocument3.documentElement.innerHTML;
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlString);
// Sử dụng node để lấy tin
HtmlNodeCollection texts = doc.DocumentNode.SelectNodes("//div[#id='footer']/p");
string kq = "";
// cho vòng lặp để lấy kết quả
foreach (var item in texts)
{
kq += item.InnerText + Environment.NewLine;
}
richTextBox1.Text = kq;
HTML code:
<div id="divTop" >
<div id="text-conent" style="width: 500px; float: right;"></div>
<div id="grid" style="margin-removed 505px; height: 700px;"></div>
</div>
It seems that on the pages where this is successful there exists a div with the id of footer
But on other pages where this fails no such div exists.
So it seems like your logic may need to change to make the search expression that doc.DocumentNode.SelectNodes more forgiving.
Alternatively create a few more search strings that would work if your original fails:
if(texts == null){
texts = doc.DocumentNode.SelectNodes("some other search string");
}
etc.
Let say this is my html code
<a class="" data-tracking-id="0_Motorola"
href="/motorola?otracker=nmenu_sub_electronics_0_Motorola">
Motorola
</a>
I used C# code to find the href value like this
var tags = htmlDoc.DocumentNode.SelectNodes("//div[#class='top-menu unit']
//ul//li//div[#id='submenu_electronics']//a");
if (tags != null)
{
foreach (var t in tags)
{
var name = t.InnerText.Trim();
var url =t.Attributes["href"].Value;
}
}
I am getting url='/motorola' but I need url=/motorola?otracker=nmenu_sub_electronics_0_Motorola
its not appending text after ?,&.. Please clarify where I went wrong.
I have used HtmlAgilityPack in the past and I have previously used it like this :
var url = t.GetAttributeValue("href","");
You can try that and see if it works.