I wrote up a function to save sites source as my WebBrowser control navigates around. I cant save only the WebBrowser.DocumentText as that leaves out all frame content.
The issue I'm having now is accessing the frame content - I cant find which method/property contains it.
The following works with a simple WebBrowser control, simply put saveWebsite(FilePath, WebBrowser1) in the DocumentCompleted event.
Ive done this in VB.NET but am familiar with C#, so C# solutions good too
Public Sub saveWebsite(ByVal sDirectory As String, ByVal oBrowser As WebBrowser)
File.WriteAllText(sDirectory & "index.htm", oBrowser.DocumentText)
'Now write a file for each frame - putting each file in its relative path'
For Each oFrame As HtmlWindow In oBrowser.Document.Window.Frames
oFI = New FileInfo(sDirectory & oBrowser.Url.MakeRelativeUri(oFrame.Url).ToString)
oFI.Directory.Create()
'ISSUE: This is the issue, unlike with oBrowser, there is no DocumentText property for oFrame.'
'ISSUE: Ive tried several things like Body.InnerText/Html, Body.OuterText/HTML, etc.'
File.WriteAllText(oFI.ToString, oFrame.WindowFrameElement.InnerText )
Next oFrame
End Sub
After more experimenting, I just found a solution. However its dirty and I dont particularly like it.
Switching the last/issue line from oFrame.WindowFrameElement.InnerText to oFrame.Document.All.Item(0).OuterHtml seems to do the trick sometimes. This wont do anything about nested frames, but Im not really worried about that.
Anywho, if anyone has a cleaner solution to the above, please let me know. (Or even a more effective/efficient way of "saving all").
Edit: The following seems to work a bit better, but still not great. (I had a webpage that started with <% VBSCRIPT %> and thats all that got saved) oFrame.Document.GetElementsByTagName("html").Item(0).OuterHtml
I was also facing simlar problem, I wanted to acess all the text contend within a Frame in a page. Below code worked for me
Dim frame = WebBrowser1.Document.Window.Frames(0) //Replace 0 with frame id if needed
Dim innderdiv= frame.Document.GetElementById("divContentLower")
Dim contents = innderdiv.InnerText
MsgBox(contents )
Here divContentLover is the id of an immediate child div within the frame. So the code returns its contents
Related
I'm working with a winform TabControl showing WebBrowsercontrols to display youtube videos.
However with two videos or more it becomes really annoying as all videos start directly.
I basically need to find out if there is a JS function, html code or a simple WebBrowser property to change, so videos are paused.
It might come in handy to find something like that for video quality too.
Has anybody ever heard of/seen where this option is stored? Or maybe the Js function itself being invoked when manually setting the quality?
EDIT:
b.DocumentCompleted += delegate { b.DocumentText=b.DocumentText.Insert(b.DocumentText.IndexOf("class=\"video-stream html5-main-video\""), "autoplay=false ");};
b.Url = new System.Uri(inp[s], System.UriKind.Absolute);
Basically this should add a new Event handler on each webbrowser form that modifies the DocumentText when the Uri that is called during creation has loaded.
Even though the browser debugger shows
<video tabindex="-1" class="video-stream html5-main-video" controlslist="nodownload" style=... src=...></video>
this isn't in the actual source code.
However I found
$oa=function(a){g.S(a.o,"video-stream");g.S(a.o,"html5-main-video");var b=a.app.g;b.zc&&a.o.setAttribute("data-no-fullscreen",!0);b.Oh&&(a.o.setAttribute("webkit-playsinline",""),a.o.setAttribute("playsinline",""));b.Nr&&a.o&&a.P(a.o,"click",a.o.play,a.o)};
in the base.js. Is it possible that youtube generates the html from the js?
How can I modify the video-tag attributes then?
I tried to modify when the event handler manipulates the video tag, since there may be DocumentCompleted events thrown from scripts or anything.
delegate (object sender, System.Windows.Forms.WebBrowserDocumentCompletedEventArgs e){
if (e.Url.AbsolutePath != ((System.Windows.Forms.WebBrowser)sender).Url.AbsolutePath){
//...
However it still fails as there is no occurance of the specific class that on the video tag.
I now dodged this by loading the Url only when the browser tab is selected, if someone finds a real solution, feel free to share
I am working on a project which is Analysis of Papers from Google Scholar. What I do is basically, parsing the HTML, storing related fields into database etc. However, I am stuck at a point, while I am taking the Titles of the publications, I realized, I am able to get first twenty elements. But, there are sixty papers in related account:
http://scholar.google.com/citations?user=B7vSqZsAAAAJ
So, I think as a solution, I need to click to the 'show more' button programmatically, so I can have all the Title's, Publication Venue etc.
What do you think? How can I perform that kind of action?
Edit: I checked the 'show more' button, while there is nothing to show as a next page, its html code still remains same. As a solution I can use loop for n times. However, I am looking for more robust solution.
Thank you for your time!
If it is clicking on a button within a WebBrowser control on a Windows Form Application, then 'Yes' you can do it.
There are ways of getting more control over identification by using XPath.
(You might need to use Javascript to use XPath for object interactions - since you haven't asked for that, I will assume you don't need it)
webBrowser.Navigate("http://www.google.com");
// Or
HtmlElement textElement = webBrowser.Document.All.GetElementsByName("q")[0];
textElement.SetAttribute("value", "your text to search");
HtmlElement btnElement = webBrowser.Document.All.GetElementsByName("btnG")[0];
btnElement.InvokeMember("click");
Or even typing into text boxes with
webBrowser1.Document.GetElementById("gs_tti0").InnerText = "hello world";
If its this website specifically, there is a simple workaround. Change the query string to what records you want.
http://scholar.google.com/citations?user=B7vSqZsAAAAJ&cstart=0&pagesize=2000
I have a form with a browser control. (this control uses IE9 because I set values on registry editor)
This web browser navigates to a specific URL and fills all fields on HTML page and submit them, then result page is displayed.
My problem is that i just want to know when this reslut page is fully loaded or completed so that i can fetch some information.
I use WebBroswer_DocumentCompleted event which works fine for the first page but not for the result page as it triggers before result
page is loaded.
I tried other solution which is to check the div tag inside the result page (this tag only appears when result page is loaded completely) and it works but not always.
My code:
private void WebBroswer_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection elc3 = this.BotBrowser.Document.GetElementsByTagName("div");
foreach (HtmlElement el in elc3)
{
if (el.GetAttribute("id").Equals("Summary_Views")) //this determine i am at the result page
{
// fetch the result
}
}}
That div id is "Summary_Views".
I can provide you the link of that website on demand which is just for BLAST tools and database website for research purpose.
Frames and IFrames will cause this event to fire multiple times. Check out this answer:
HTML - How do I know when all frames are loaded?
Or this answer:
How to use WebBrowser control DocumentCompleted event in C#?
Or ms's kb article:
http://support.microsoft.com/kb/180366
Do you know if there are frames? If so then please say so, so people can help with that. If not then say so, so people can offer alternatives.
My guess is that the content is being generated by JavaScript. If it is then the document is complete before the JavaScript executes and you need to somehow wait until the JavaScript is done. The solution depends upon the web page. So you might need to process multiple document completes for diagnostic purposes and attempt to determine if there is a way to know which one you need.
At last i have solved my problem. I put a timer control from toolbox and set its time interval to 200ms and its Autoreset property to false. I set a tick event which has a code to check every 200ms whether this Div has been loaded or not, after that, Autoreset property is set to true.This solution is working perfectly :)
First question but I really am in a jam.
I have a webpage render which is working perfectly. However, I need to be able to control the initial display position (almost like a href #anchors in HTML) but without any access to the site content.
From as far as i can see i have no access to the scrollBars other than the bool to enable / disable..
Is there anything i can do to even force a scroll down of 20% for example, and then I can create a form to adjust later on.
Any assistance would be HUGELY appreciated although from what I have researched it seems unlikely.
I have the regular windows WebBrowser Render
private System.Windows.Forms.WebBrowser m_webBrowser;
Thanks !
--This is for c# standalone application.. Not WebBased.
Have you tried using jquery?
I personally use the animate method from jquery to scroll to certain elemnts in my webpage.
Example:
$('html, body').animate({scrollTop: $('#the-element-you-want-to-scroll-to).offset().top}, 1000);
PS: For the last parameter you can control the time it will use to scroll to destination, that offering you a nice effect.(in milliseconds)
I managed to resolve it using a strange method..
I basically injected some javascript into the rendered HTML manually.. Then the rest was easy.
i used something like this :
string updatedSource = WebBrowser.DocumentText.Replace("Google", "Foogle");
string extraSource =
"<html><body>Script goes here <br/>" +
"<div><p>BLA BLA BLA</p></div></body></html>";
WebBrowser.DocumentText = extraSource + updatedSource;
WebBrowser.Update();
Maybe it will help someone.
I've got some code that builds a PDF from an HTML template, then attaches several other PDFs to make one big PDF using abcPDF 7.
All this works fine and dandy -- however, I'd like to make some links in the HTML portion of the PDF to jump down to one of the several attached PDFs.
I tried creating links and anchors using the technique referenced here, by putting the
Link to another page
link in the HTML, then putting the anchor
<div><a name="elementId">A div that's on another page</a></div>
as an added-on paste-over on the top of the first page of the PDF I wanted to jump to.
I can see the text of the anchor just fine, and the link to it is blue, but it doesn't do anything.
As the next attempt, I've created bookmarks that work as well. Can someone point me in the direction to go back and adjust the links in the HTML portion to use them to jump to the bookmarks?
I apologize in advance for a lack of code, and I'm not asking for any code now.. I'd just like a more general way to go about it, like "try something like this." I'm not having much luck finding anything that is close to what I'm trying to do, not even on WebSuperGoo's website.
This method has worked for me in the latest ABCpdf version (9) Add a bookmark to each page in your document:
For i = 1 to pdf.PageCount
pdf.PageNumber = i
pdf.AddBookmark("Page " & i, True)
Next
Then where you want to insert a link you can reference the bookmark - in this case we create a table of contents by looping through each bookmark we've created:
For Each bm As Bookmark In pdf.Bookmark
toc &= "<Font annots='goto:" + bm.Page.PageNumber.ToString() + "'>" & bm.Title & "</Font><br>"
Next
pdf.AddHtml(toc)
The Websupergoo team supplied me with some example code and that's what this is based off of - so thanks to them!