I'm looking to get the *.aspx page name from the parent of an IHTMLElement. I started looking through the attributes on an IHTMLElement, and the document property looked promising.
Do I just need to cast as follows?
IHTMLElement elem;
elem = getElement(args);
IHTMLElement2 dom = (IHTMLElement2)elem.document;
string aspx = dom.<something?>;
That doesn't appear to work, but I feel like I'm on the right track. Ideas?
HTMLDocument doc = somedoc;
Regex pullASPX = new Regex(#"(?<=\/)[^//]*?(?=\.aspx)");
if (elem != null && !doc.url.Contains("default.aspx"))
{
EchoAbstraction.page = pullASPX.Match(doc.url).Value;
EchoAbstraction.tag = tagName;
EchoAbstraction.id = elem.id;
}
This is how I ended up doing it. I had found the ID in the dom already, so I just pulled the current doc page and parsed the URL.
Related
I need to create a complete IHTMLDocument2 document so I end up with this snippet which works. However, the URL property seems to be ignored all the times.
string page = "my HTML code in string";
IHTMLDocument2 doc2 = (IHTMLDocument2)new HTMLDocument();
doc2.url = "www.stackoverflow.com";
doc2.write(new object[] { page });
doc2.close();
while (doc2.body == null)
Application.DoEvents();
Now doc2.url is always "about:blank". How can I set this URL property?
Thank you in advance,
I'm trying to get all languages from Google Translate. When I Open Developer Tools and click one of the language when all languages are popped (when arrow clicked), It gives //*[#id=':7']/div/text() for Arabic, but it returns null when I try to get node:
async Task AddLanguages()
{
try
{
// //*[#id=":6"]/div/text()
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
for (int i = 6; i <= 9; i++)
{
//*[#id=":6"]/div/text() //*[#id=":6"]/div/div
Debug.WriteLine(i);
var element = document.DocumentNode.SelectSingleNode("//*[#id=':7']/div/text()");
Trace.WriteLine(element == null, "Element is null");
}
}
catch (Exception e)
{
this.ShowMessageAsync("Hata!", "Dilleri yüklerken hata ortaya çıktı.");
}
}
Element is null: True outputs all the times ( I was trying to use for loop to loop through languages but, it doesnt even work for single one!)
I guess your xpath is wrong. You can try something like:
string Url = "https://translate.google.com/";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
var arabic = doc.DocumentNode.Descendants("div").FirstOrDefault(_ => _.ChildNodes.Any(node => node.Name.Equals("#text") && node.InnerText.Equals("Arabic")));
Since I can't comment yet...Have you tried clicking on the dropdwon first before looking for the elements?
Clicking on //*[#id='gt-sl-gms'] or it's inner div would make the elements visible..
That should work..
Anyway, I can't make $x work for the console in google chrome. I'm getting an Uncaught Type Error currently. Not sure if that has to do with anything..
Edit: Oh wait i think I know your problem..upon closer inspection of the element, it seems that the element (div) has another div before the text. so try /*[#id=':7']/div/text()[2]
This is part of html that i am parsing
<li>http://some.link.com/4DFR6DJ43Y/sessionid?ticket=ASDSIDFK32423421</li>
I want to get http://some.link.com/4DFR6DJ43Y/sessionid?ticket=ASDSIDFK32423421 as an output.
So far i have tried
HtmlDocument document = new HtmlDocument();
document.LoadHtml(responseFromServer);
var link = document.DocumentNode.SelectSingleNode("//a");
if (link != null)
{
if(link.innerText.Contains("ticket"))
{
Console.WriteLine(link.InnerText);
}
}
... but output is null (no inner texts are found).
That's probably because the first link in your HTML document as returned by SelectSingleNode(), doesn't contains text "ticket". You can check for the target text in XPath directly , like so :
var link = document.DocumentNode.SelectSingleNode("//a[contains(.,'ticket')]");
if (link != null)
{
Console.WriteLine(link.InnerText);
}
or using LINQ style if you like :
var link = document.DocumentNode
.SelectNodes("//a")
.OfType<HtmlNode>()
.FirstOrDefault(o => o.InnerText.Contains("ticket"));
if (link != null)
{
Console.WriteLine(link.InnerText);
}
You provided a piece of code that won't compile because innerText is not defined. If you try this code, you'll probably get what you asked for:
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
var link = document.DocumentNode.SelectSingleNode("//a");
if (link != null)
{
if(link.InnerText.Contains("ticket"))
{
Console.WriteLine(link.InnerText);
}
}
You can use HTML Agility Pack instead of HTML Document then you can do deep parsing in HTML. for more information please see the following information.
See the following link.
How to use HTML Agility pack
Normally I can access html tags and I can set their values via the code something like;
HtmlElementCollection coll = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement curElement in coll)
{
if (curElement.GetAttribute("name").ToString() == "login")
{
curElement.SetAttribute("Value", "123456789");
}
}
But if html input area exist in a iframe tag, This code didnt work,
And I changed line:
var coll = webBrowser1.Document.GetElementsByTagName("input");
to
var coll = webBrowser1.Document.Window.Frames[0].Document.GetElementsByTagName("iframe")
But it still didnt work. Please Help. I cannot get over this problem since last week
Your code looks like it should work if you change this line:
coll = webBrowser1.Document.Window.Frames[0].Document.GetElementsByTagName("iframe")
to
coll = webBrowser1.Document.Window.Frames[0].Document.GetElementsByTagName("input");
Here is a working example...
HtmlWindow iframe = webBrowser1.Document.Window.Frames[0];
HtmlElement input = iframe.Document.GetElementsByTagName("input")[0];
input.SetAttribute("value", "Test");
This assumes obviously that you only have at least one iframe element and at least one input element in the child document.
I'm trying to parse this field, but can't get it to work. Current attempt:
var name = doc.DocumentNode.SelectSingleNode("//*[#id='my_name']").InnerHtml;
<h1 class="bla" id="my_name">namehere</h1>
Error: Object reference not set to an instance of an object.
Appreciate any help.
#John - I can assure that the HTML is correctly loaded. I am trying to read my facebook name for learning purposes. Here is a screenshot from the Firebug plugin. The version i am using is 1.4.0.
http://i54.tinypic.com/kn3wo.jpg
I guess the problem is that profile_name is a child node or something, that's why I'm not able to read it?
The reason your code doesn't work is because there is JavaScript on the page that is actually writing out the <h1 id='profile_name'> tag, so if you're requesting the page from a User Agent (or via AJAX) that doesn't execute JavaScript then you won't find the element.
I was able to get my own name using the following selector:
string name =
doc.DocumentNode.SelectSingleNode("//a[#id='navAccountName']").InnerText;
Try this:
var name = doc.DocumentNode.SelectSingleNode("//#id='my_name'").InnerHtml;
HtmlAgilityPack.HtmlNode name = doc.DocumentNode.SelectSingleNode("//h1[#id='my_name']").InnerText;
public async Task<List<string>> GetAllTagLinkContent(string content)
{
string html = string.Format("<html><head></head><body>{0}</body></html>", content);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var nodes = doc.DocumentNode.SelectNodes("//[#id='my_name']");
return nodes.ToList().ConvertAll(r => r.InnerText).Select(j => j).ToList();
}
It's ok with ("//a[#href]"); You can try it as above.Hope helpful