Access in IFrame and SetAttribute value - c#

Normally I can access html tags and I can set their values via the code something like;
HtmlElementCollection coll = webBrowser1.Document.GetElementsByTagName("input");
foreach (HtmlElement curElement in coll)
{
if (curElement.GetAttribute("name").ToString() == "login")
{
curElement.SetAttribute("Value", "123456789");
}
}
But if html input area exist in a iframe tag, This code didnt work,
And I changed line:
var coll = webBrowser1.Document.GetElementsByTagName("input");
to
var coll = webBrowser1.Document.Window.Frames[0].Document.GetElementsByTagName("iframe")
But it still didnt work. Please Help. I cannot get over this problem since last week

Your code looks like it should work if you change this line:
coll = webBrowser1.Document.Window.Frames[0].Document.GetElementsByTagName("iframe")
to
coll = webBrowser1.Document.Window.Frames[0].Document.GetElementsByTagName("input");
Here is a working example...
HtmlWindow iframe = webBrowser1.Document.Window.Frames[0];
HtmlElement input = iframe.Document.GetElementsByTagName("input")[0];
input.SetAttribute("value", "Test");
This assumes obviously that you only have at least one iframe element and at least one input element in the child document.

Related

Inserting Custom Element with AngleSharp

I'm trying to update a site that uses an sanitizer based on AngleSharp to process user-generated HTML content. The site users need to be able to embed iframes, and I am trying to use a whitelist to control what domains the frame can load. I'd like to rewrite the 'blocked' iframes to a new custom element "blocked-iframe" that will then be stripped out by the sanitizer, so we can review if other domains need to be added to the whitelist.
I'm trying to use a solution based on this answer: https://stackoverflow.com/a/55276825/794
It looks like so:
string BlockIFrames(string content)
{
var parser = new HtmlParser(new HtmlParserOptions { });
var doc = parser.Parse(content);
foreach (var element in doc.QuerySelectorAll("iframe"))
{
var src = element.GetAttribute("src");
if (string.IsNullOrEmpty(src) || !Settings.Sanitization.IFrameWhitelist.Any(wls => src.StartsWith(wls)))
{
var newElement = doc.CreateElement("blocked-iframe");
foreach (var attr in element.Attributes)
{
newElement.SetAttribute(attr.Name, attr.Value);
}
element.Insert(AdjacentPosition.BeforeBegin, newElement.OuterHtml);
element.Remove();
}
}
return doc.FirstElementChild.OuterHtml;
}
It ostensibly works but I notice that the angle brackets in the new element's tag are being escaped on insertion, so the result just gets written into the page as text. I think I could build a map of replacements and just execute them against the string before sending back but I'm wondering if theres a way to do it using AngleSharp's API. The site is using 0.9.9 currently and I'm not sure how far ahead we'll be able to update considering some of the other dependencies in play.
Digging around in the source I found the ReplaceChild method in INode, which works if called from the parent of element
string BlockIFrames(string content)
{
var parser = new HtmlParser(new HtmlParserOptions { });
var doc = parser.Parse(content);
foreach (var element in doc.QuerySelectorAll("iframe"))
{
var src = element.GetAttribute("src");
if (string.IsNullOrEmpty(src) ||
!Settings.Sanitization.IFrameWhitelist.Any(wls => src.StartsWith(wls)))
{
var newElement = doc.CreateElement("blocked-iframe");
foreach (var attr in element.Attributes)
{
newElement.SetAttribute(attr.Name, attr.Value);
}
element.Parent.ReplaceChild(newElement, element);
}
}
return doc.FirstElementChild.OuterHtml;
}
I will keep testing but this seems decent enough to me, if there is a better way I'd love to hear it.

xPath is wrong given by the Browser or HTMLAgilityPack cannot use xPath?

I'm trying to get all languages from Google Translate. When I Open Developer Tools and click one of the language when all languages are popped (when arrow clicked), It gives //*[#id=':7']/div/text() for Arabic, but it returns null when I try to get node:
async Task AddLanguages()
{
try
{
// //*[#id=":6"]/div/text()
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
for (int i = 6; i <= 9; i++)
{
//*[#id=":6"]/div/text() //*[#id=":6"]/div/div
Debug.WriteLine(i);
var element = document.DocumentNode.SelectSingleNode("//*[#id=':7']/div/text()");
Trace.WriteLine(element == null, "Element is null");
}
}
catch (Exception e)
{
this.ShowMessageAsync("Hata!", "Dilleri yüklerken hata ortaya çıktı.");
}
}
Element is null: True outputs all the times ( I was trying to use for loop to loop through languages but, it doesnt even work for single one!)
I guess your xpath is wrong. You can try something like:
string Url = "https://translate.google.com/";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
var arabic = doc.DocumentNode.Descendants("div").FirstOrDefault(_ => _.ChildNodes.Any(node => node.Name.Equals("#text") && node.InnerText.Equals("Arabic")));
Since I can't comment yet...Have you tried clicking on the dropdwon first before looking for the elements?
Clicking on //*[#id='gt-sl-gms'] or it's inner div would make the elements visible..
That should work..
Anyway, I can't make $x work for the console in google chrome. I'm getting an Uncaught Type Error currently. Not sure if that has to do with anything..
Edit: Oh wait i think I know your problem..upon closer inspection of the element, it seems that the element (div) has another div before the text. so try /*[#id=':7']/div/text()[2]

How to retrieve aspx parent name from IHTMLElement

I'm looking to get the *.aspx page name from the parent of an IHTMLElement. I started looking through the attributes on an IHTMLElement, and the document property looked promising.
Do I just need to cast as follows?
IHTMLElement elem;
elem = getElement(args);
IHTMLElement2 dom = (IHTMLElement2)elem.document;
string aspx = dom.<something?>;
That doesn't appear to work, but I feel like I'm on the right track. Ideas?
HTMLDocument doc = somedoc;
Regex pullASPX = new Regex(#"(?<=\/)[^//]*?(?=\.aspx)");
if (elem != null && !doc.url.Contains("default.aspx"))
{
EchoAbstraction.page = pullASPX.Match(doc.url).Value;
EchoAbstraction.tag = tagName;
EchoAbstraction.id = elem.id;
}
This is how I ended up doing it. I had found the ID in the dom already, so I just pulled the current doc page and parsed the URL.

C# WPF WebBroswer Control: How to use JavaScript

I am using WPF WebBrowser control and I want to acces some of the JavaScript functions but there is the problem.
I can use InvokeScript and execute browser.InvokeScript("alert", "Hello");q but how to get element by ID or by TAG and how to assign that element to javascript var?
Example:
Javascript:
var elements = document.getElementsByTagName("embed");
elements[0].doSomething();
C#:
How?
I tryed everything but nothing worked. Can anyone help me :(
Quite a late answer, but if anyone else needs it:
The direct C#: http://msdn.microsoft.com/en-us/library/system.windows.forms.htmldocument.getelementsbytagname.aspx
HtmlElementCollection elems = webBrowser1.Document.GetElementsByTagName("embed");
foreach (HtmlElement elem in elems)
{
elem.InvokeMember("doSomething");
}
The alternative: http://msdn.microsoft.com/en-us/library/a0746166
Basically you should create a function in JS:
var myCustomFunc = function(tagName) {
var elements = document.getElementsByTagName(tagName); elements[0].doSomething();
}
And then call it from C# with
webBrowser1.Document.InvokeScript("myCustomFunc ", new String[] { "embed" });
The variable "tagName" gets replaced with "embed"

Why is my function just skipping over the code that uses HtmlAgilityPack?

The first half of my function doesn't use htmlagilitypack and I know it functions as I want. however the function finishes without doing anything with the second half and doesnt return an errors. Please help
void classListHtml()
{
HtmlElementCollection elements = browser.Document.GetElementsByTagName("tr");
html = "<table>";
int i = 0;
foreach (HtmlElement element in elements)
{
if (element.InnerHtml.Contains("Marking Period 2") && i != 0)//will be changed to current assignment reports later
{
html += "" + element.OuterHtml;
}
else if (i == 0)
{
i++;
continue;
}
else
continue;
}
html += "" + "</table>";
myDocumentText(html);
//---------THIS IS WHERE IT STOPS DOING WHAT I WANT-----------
//removing color and other attributes
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(html);
HtmlNodeCollection nodeCollection = doc.DocumentNode.SelectNodes("//tr");//xpath expression for all row nodes
string[] blackListAttributes={"width", "valign","bgcolor","align","class"};
foreach(HtmlNode node in nodeCollection)//for each row node
{
HtmlAttributeCollection rows = node.Attributes;// the attributes of each row node
foreach (HtmlAttribute attribute in rows)//for each attribute
{
if (blackListAttributes.Contains(attribute.Name))//if its attribute name is in the blacklist, remove it.
attribute.Remove();
}
}
html = doc.ToString();
myDocumentText(html);//updating browser with new html
}
HtmlDocument.ToString() does not send back the text, unless you changed the original code, maybe you're looking for HtmlDocument.DocumentNode.OuterXml or Document.Save( ... text ...)?
myDocumentText(html);
What does this method do?
My assumption is that you have an exception being thrown somewhere within this method, and it's either being swallowed, or your debug environment is set to not break on user thrown exceptions.
Can you post the code within this method?

Categories