I am attempting to access the HTML of a page after it has been modified by the JavaScripts on the page. This is what I have been currently attempting based on what I have found online.
using System;
using System.Windows.Forms;
using System.IO;
namespace WebBrowserDemo
{
class Program
{
public const string TestUrl = #"http://www.theverge.com/2012/7/2/3126604/android-jelly-bean-updates-htc-samsung-google-pdk";
[STAThread]
static void Main(string[] args)
{
WebBrowser wb = new WebBrowser();
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
wb.Navigate(TestUrl);
while (wb.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
Console.WriteLine("\nPress any key to continue...");
Console.ReadKey(true);
}
static void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = (WebBrowser)sender;
HtmlElement document = wb.Document.GetElementsByTagName("html")[0];
using (StreamWriter sw = new StreamWriter("OuterHTML.txt"))
{
sw.WriteLine(document.OuterHtml);
}
var abc = wb.Document.InvokeScript("eval", new object[] { "window.scrollTo(0, document.body.scrollHeight);" });
Console.WriteLine();
document = wb.Document.GetElementsByTagName("html")[0];
using (StreamWriter sw = new StreamWriter("OuterHTML2.txt"))
{
sw.WriteLine(document.OuterHtml);
}
}
}
}
The ultimate goal is to scroll to the bottom of the page activating any JS to load the comments on the article. Though currently the html I get back from before and after the script is ran is the same.
Any Suggestions?
Thanks
You should do it with a WebBrowser control.
This is basically a componentized version of IE. Load the page into the control. You probably do not even need to display the page. You can register an event handler that will be called when the page is fully loaded. There is no definite way to determine when the scripts have "completed" - scripts are open-ended and may run as long as they like. So you'd have to build in a heuristic "Wait period", then examine the HTML after that wait period passes.
Incidentally this is exactly what IECapt does.
Related
so I am working in a closed Framework that uses C#.
I have multiple HTML Documents which I need to print (formatted) in a row. I have tried using the WebBrowser Object (System.Windows.Forms), which works fine for me if I only print one HTML Document. The goal is to use the Preview or Print Dialog exaclty once at the start and then use those settings for the rest of the HTML Docs. I was unable to find any suitable solution to this problem without using external libraries.
I have tried to concatenate the HTML Docs into the Browser.DocumentText with a loop. This was working, but I was unable to get the Page-Break working correctly.
Import System.Windows.Forms
Import System.IO
Import System.Text
Import System.Collections.Generic
Public Class PrintHTML{
private WebBrowser printer;
private List<string> htmlDocs;
private void PrintDocument(object sender, WebBrowserDocumentCompletedEventArgs e)
{
printer.ShowPageSetupDialog();
printer.ShowPrintDialog();
}
private void PrintPreview(object sender, WebBrowserDocumentCompletedEventArgs e)
{
printer.ShowPageSetupDialog();
printer.ShowPrintPreviewDialog();
}
private void DoPrint(Boolean preview)
{
printer = new WebBrowser();
htmlDocs = new List<string>();
string printText = "";
// add htmlDocs
If (preview){
printer.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(PrintDocument);
}else{
printer.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(PrintPreview);
}
foreach (string html in htmlDocs)
{
printText += html;
}
printer.DocumentText = printText;
}
{
I have also tried to add a media-pagebreak CSS-Property into the htmls:
#media print {
h6 {
page-break-after: always
}
}
Which seems to be working when I try to print the HTML with Google Chrome or IE but is not working when I print it through the WebBrowser object.
What I am seeking is a generic "Page-Break" so that I can concatenate all my HTMLs, or a solution where I can use the Print Dialog ONCE and then silent print the rest of the HTMLs with the Dialog-Settings.
Thanks in advance!
I want to get html code from website. In Browser I usually can just click on ‘View Page Source’ in context menu or something similar. But how can I automatized it? I’ve tried it with WebBrowser class but sometimes it doesn’t work. I am not web developer so I don’t really know if my approach at least make sense. I think main problem is that I sometimes get html where not all code was executed. Hence it is uncompleted. I have problem with e.g. this site: http://www.sreality.cz/en/search/for-sale/praha
My code (I’ve tried to make it small but runnable on its own):
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Windows.Forms;
namespace WebBrowserForm
{
internal static class Program
{
[STAThread]
private static void Main()
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
for (int i = 0; i < 10; i++)
{
Form1 f = new Form1();
f.ShowDialog();
}
// Now I can check Form1.List and see that some html is final and some is not
}
}
public class Form1 : Form
{
public static List<string> List = new List<string>();
private const string Url = "http://www.sreality.cz/en/search/for-sale/praha";
private System.Windows.Forms.WebBrowser webBrowser1;
public Form1()
{
this.webBrowser1 = new System.Windows.Forms.WebBrowser();
this.SuspendLayout();
this.webBrowser1.Dock = System.Windows.Forms.DockStyle.Fill;
this.webBrowser1.Name = "webBrowser1";
this.webBrowser1.TabIndex = 0;
this.ResumeLayout(false);
Load += new EventHandler(Form1_Load);
this.webBrowser1.ObjectForScripting = new MyScript();
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.Navigate(Url);
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
// Final html for 99% of web pages, but unfortunately not for all
string tst = webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
webBrowser1.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
Application.DoEvents();
webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
Application.DoEvents();
}
}
[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
HtmlDocument doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
string renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
// here I sometimes get full html but sometimes the same as in webBrowser1_DocumentCompleted method
List.Add(renderedHtml);
((Form1)Application.OpenForms[0]).Close();
}
}
}
}
I would expect that in ‘webBrowser1_DocumentCompleted’ method I could get final html. It usually works, but with this site it doesn’t. So I’ve tried get html in my own code which should be executed in web site -> method ‘CallServerSideCode’. What is strange that sometimes I get final html (basically the same as if I do it manually via Browser) but sometimes not. I think the problem is caused because my script start before whole web site is rendered instead after. But I am not really sure since this kind of things are far from my comfort zone and I don’t really understand what I am doing. I’m just trying to apply something what I found on the internet.
So, does anyone knows what is wrong with the code? Or even more importantly how to easily get final html from the site?
Any help appreciated.
You should use WebClient class to download HTML page. No display control necessary.
You want method DownloadString
May be it will be helpful if you add calling of your external function to the end of the body and wrap it by Jquery "ondomready" function. I mean something like this:
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
// Final html for 99% of web pages, but unfortunately not for all
string tst = webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
webBrowser1.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
HtmlElement body = webBrowser1.Document.GetElementsByTagName("body")[0];
HtmlElement scriptEl = webBrowser1.Document.CreateElement("script");
IHTMLScriptElement element = (IHTMLScriptElement)scriptEl.DomElement;
element.text = "$(function() { window.external.CallServerSideCode(); });";
body.AppendChild(scriptEl);
}
}
[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
HtmlDocument doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
string renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
// here I sometimes get full html but sometimes the same as in webBrowser1_DocumentCompleted method
List.Add(renderedHtml);
((Form1)Application.OpenForms[0]).Close();
}
}
I've searched up and down the internet throughout the day, and I'm just stumped.
What I want to do is play a youtube video inside of C# using the youtube API. Then I want a function on the form to be called when the video finishes playing. Unfortunately, I can't seem to find a way to get the events to fire.
(Using Visual C# 2010 Express, and have IE9. For reference.)
using System;
using System.Windows.Forms;
namespace WindowsFormsApplication1
{
using System.Runtime.InteropServices;
public partial class Form1 : Form
{
// This nested class must be ComVisible for the JavaScript to be able to call it.
[ComVisible(true)]
public class ScriptManager
{
// Variable to store the form of type Form1.
private Form1 mForm;
// Constructor.
public ScriptManager(Form1 form)
{
// Save the form so it can be referenced later.
mForm = form;
}
// This method can be called from JavaScript.
public void MethodToCallFromScript()
{
// Call a method on the form.
mForm.GoToNext();
}
}
public Form1()
{
InitializeComponent();
}
public void GoToNext()
{
MessageBox.Show("Play the next song");
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.Navigate("http://localhost/index.html");
}
}
}
That is my Form1.cs code. Form1.cs [Design] consists of nothing more than a webBrowser control.
I've tried numerous things to get this to work, from installing an http server to run the html 'live' to running it from a file directly off my computer, to setting the document text with the code as a string. All has failed me thus far. In IE9 if I open the index.html file locally (as a file and not through my webserver) the events do not fire. If I run it live off my webserver the events do fire. However in C# webBrowser control, these events do not seem to fire at all, no matter where it's run from.
<!DOCTYPE html>
<html>
<body>
<div id="player"></div>
<script>
var tag = document.createElement('script');
tag.src = "http://www.youtube.com/player_api";
var firstScriptTag = document.getElementsByTagName('script')[0];
firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
var player;
function onYouTubePlayerAPIReady() {
player = new YT.Player('player', {
height: '390',
width: '640',
playerVars: { 'autoplay': 1, 'controls': 1,'autohide':1,'wmode':'opaque' },
videoId: 'G4cRrOcDXXY',
events: {
'onReady': onPlayerReady,
'onStateChange': onPlayerStateChange
}
});
}
function onPlayerReady(event) {
event.target.mute();
}
function onPlayerStateChange(event) {
if(event.data === 0) {
alert('done');
window.external.MethodToCallFromScript();
}
}
</script>
</body>
</html>
I'm out of ideas, so any help would be greatly appreciated. I'd love to get events to fire in the C# WebBrowser control.
Wow. I was in the process of typing out my newest problem when I attempted something else with great success. The problem was that I'd navigate to my .html file on my webserver, and it'd begin playing a video, when the video finished, I'd have javascript tell C# to navigate to the same URL with a different youtube ID (to play another video). The second video would fail to fire the events.
I've overcome this by using different javascript, such as what was mentioned here.
I did start using Visual Studio 2013 Express, and IE11. That cleared up quite a few problems I was bumping into on it's own. I'll provide you guys with my current code, just in case anyone ever runs into the issues I've been dealing with.
My form:
using System;
using System.Windows.Forms;
using System.Data.SQLite;
namespace WindowsFormsApplication1
{
using System.Runtime.InteropServices;
public partial class Form1 : Form
{
// This nested class must be ComVisible for the JavaScript to be able to call it.
[ComVisible(true)]
public class ScriptManager
{
// Variable to store the form of type Form1.
private Form1 mForm;
// Constructor.
public ScriptManager(Form1 form)
{
// Save the form so it can be referenced later.
mForm = form;
}
public void AnotherMethod(string message)
{
mForm.GoToNext();
}
}
public Form1()
{
InitializeComponent();
}
public void GoToNext()
{
timer1.Interval = 2000;
timer1.Enabled = true;
}
public object MyInvokeScript(string name, params object[] args)
{
return webBrowser1.Document.InvokeScript(name, args);
}
public void SongCheck()
{
// Disable timer. Enable it later if there isn't a song to play.
if (timer1.Enabled)
timer1.Enabled = false;
// Connect to my SQLite db,
SQLiteConnection mySQLite = new SQLiteConnection("Data Source=ytsongrequest.s3db;Version=3;");
mySQLite.Open();
// The SQLite DB consists of three columns. id, youtubeid, requestor
// the 'id' auto increments when a row is added into the database.
string sqlCommand = "select * from songs order by id asc limit 1";
SQLiteCommand x = new SQLiteCommand(sqlCommand, mySQLite);
SQLiteDataReader reader = x.ExecuteReader();
if (reader.HasRows) {
while (reader.Read())
{
// Use our custom object to call a javascript function on our webpage.
object o = MyInvokeScript("createPlayerAndPlayVideo", reader["youtubeid"]);
label2.Text = reader["requestor"].ToString();
// Since we've played the song, we can now remove it.
x = new SQLiteCommand("delete from songs where id = " + reader["id"], mySQLite);
x.ExecuteNonQuery();
}
mySQLite.Close();
}
else
{
// Set a timer to check for a new song every 10 seconds.
timer1.Interval = 10000;
timer1.Enabled = true;
}
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.ObjectForScripting = new ScriptManager(this);
webBrowser1.Navigate("http://localhost/testing.html");
GoToNext();
}
private void timer1_Tick(object sender, EventArgs e)
{
SongCheck();
}
}
}
My HTML page that I have on my server:
<!DOCTYPE html>
<html>
<body>
<div id="player"></div>
<script>
var tag = document.createElement('script');
tag.src = "https://www.youtube.com/iframe_api";
var firstScriptTag = document.getElementsByTagName('script')[0];
firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
var query = getQueryParams(document.location.search);
var player;
var playerAPIReady;
function onYouTubePlayerAPIReady() {
playerAPIReady = true;
}
function onPlayerReady() {
player.playVideo();
player.addEventListener('onStateChange', function(e) {
if (e.data === 0) {
window.external.AnotherMethod('Finished video');
}
});
}
function getQueryParams(qs) {
qs = qs.split("+").join(" ");
var params = {}, tokens,
re = /[?&]?([^=]+)=([^&]*)/g;
while (tokens = re.exec(qs)) {
params[decodeURIComponent(tokens[1])]
= decodeURIComponent(tokens[2]);
}
return params;
}
function createPlayerAndPlayVideo(id) {
if(! playerAPIReady) {
// player API file not loaded
return;
}
if (! player) {
player = new YT.Player('player', {
height: '390',
width: '640',
videoId: id,
events: {
'onReady': onPlayerReady
}
});
}
else {
player.loadVideoById(id);
}
}
</script>
</body>
</html>
I am interested in checking the content of a website, the content changes frequently and when I view the website on any browser, it refreshes itself every 30 seconds. I want to know when the content has changed.
I am using winforms and I want to just click a button to start a loop, every 30 seconds. I don't want to hit the website too frequently, in fact the web pages own refresh is more than enough for my needs.
My code works when I click a button (btnCheckWebsite), if I wait a minute and then click btnCheckWebsite again, my message box pops up because the web page has changed. This is great however I want to do this in a while loop. When I un-comment my while loop, the DocumentText never changes. I have debugged it and for some reason it's the same text every time, even when the web page has changed in the real world, it stays the same in my code.
So my question is why can't I use a loop and what can I do instead to run this repeatedly without any input from me?
As a bonus, I would like to remove the .Refresh() I added this because it won't work without it however as I understand it, this refreshes the whole page. When I use a browser I see the page updating even when I don't refresh the whole page.
Just for background info, I did start by having a WebBrowser control on my form, the page refreshes automatically. I used the same code and have the same problem, interestingly, the WebBrowser control on my windows form refreshes by itself no problem, until I click btnCheckWebsite and then it stops refreshing! Also I know about webrequest but I don't know how to use it for my purposes.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Threading;
namespace Check_Website
{
public partial class Form1 : Form
{
public WebBrowser _memoryWebBrowser = new WebBrowser();
String _previousSource = "emptySource";
public Form1()
{
InitializeComponent();
_memoryWebBrowser.Navigate(new Uri("http://www.randomurl.com/"));
}
private void btnCheckWebsite_Click(object sender, EventArgs e)
{
//I want to un-comment this while loop and let my code run itself but it stops working
//when I introduce my while loop.
//while (1 < 2 )
//{
//Thread.Sleep(30000);
checkWebsite();
//}
}
private void checkWebsite()
{
//Why do I need this refresh? I would rather not have to hit the web page with a refresh.
//When I view the webpage it refreshed with new data however when I use a WebBrowser
//the refresh just doesn't happen unless I call Refresh.
_memoryWebBrowser.Refresh();
Thread.Sleep(500);
while (((_memoryWebBrowser.ReadyState != WebBrowserReadyState.Complete) || (_memoryWebBrowser.DocumentText.Length < 3000)))
{
Thread.Sleep(1000);
}
String source = _memoryWebBrowser.DocumentText;
if ((source != _previousSource) && (_previousSource != "emptySource"))
{
//Hey take a look at the interesting new stuff on this web page!!
MessageBox.Show("Great news, there's new stuff on this web page www.randomurl.co.uk!!" );
}
_previousSource = source;
}
}
}
You'd need to do your processing upon DocumentCompleted event. This event is asynchronous, so if you want to do this in a loop, the execution thread must pump messages for this event to fire. In a WinFroms app, your UI thread is already pumping messages in Applicaiton.Run, and the only other endorsed way to enter nested message loop on the same thread is via a modal form (here's how it can be done, see in the comments).
Another (IMO, better) way of doing such Navigate/DocumentCompleted logic without a nested message loop is by using async/await, here's how. In the classic sense, this is not exactly a loop, but conceptually and syntactically it might be exactly what you're looking for.
You can catch the WebBrowser.Navigated Event to get notified when the page was reloaded. So you wouldn't need a loop for that. (I meant the ready loop)
Just navigate every 30 seconds to the page in a loop and in the Navigated Event you can check whether the site has changed or not.
You'd better hook up DocumentCompleted event to check its DocumentText property!
The WebBrowser Element is very buggy and has much overhead for your needs. Instead of that you should use WebRequest. Because you said you don't know how to use, here's an (working) example for you.
using System;
using System.Windows.Forms;
using System.Net;
using System.IO;
namespace Check_Website
{
public partial class Form1 : Form
{
String _previousSource = string.Empty;
System.Windows.Forms.Timer timer;
private System.Windows.Forms.CheckBox cbCheckWebsite;
private System.Windows.Forms.TextBox tbOutput;
public Form1()
{
InitializeComponent();
this.cbCheckWebsite = new System.Windows.Forms.CheckBox();
this.tbOutput = new System.Windows.Forms.TextBox();
this.SuspendLayout();
//
// cbCheckWebsite
//
this.cbCheckWebsite.AutoSize = true;
this.cbCheckWebsite.Location = new System.Drawing.Point(12, 12);
this.cbCheckWebsite.Name = "cbCheckWebsite";
this.cbCheckWebsite.Size = new System.Drawing.Size(80, 17);
this.cbCheckWebsite.TabIndex = 0;
this.cbCheckWebsite.Text = "checkBox1";
this.cbCheckWebsite.UseVisualStyleBackColor = true;
//
// tbOutput
//
this.tbOutput.Location = new System.Drawing.Point(12, 35);
this.tbOutput.Multiline = true;
this.tbOutput.Name = "tbOutput";
this.tbOutput.Size = new System.Drawing.Size(260, 215);
this.tbOutput.TabIndex = 1;
//
// Form1
//
this.ClientSize = new System.Drawing.Size(284, 262);
this.Controls.Add(this.tbOutput);
this.Controls.Add(this.cbCheckWebsite);
this.Name = "Form1";
this.Load += new System.EventHandler(this.Form1_Load);
this.ResumeLayout(false);
this.PerformLayout();
timer = new System.Windows.Forms.Timer();
timer.Interval = 30000;
timer.Tick += timer_Tick;
}
private void Form1_Load(object sender, EventArgs e)
{
timer.Start();
}
void timer_Tick(object sender, EventArgs e)
{
if (!cbCheckWebsite.Checked) return;
WebRequest request = WebRequest.Create("http://localhost/check_website.html");
request.Method = "GET";
WebResponse response = request.GetResponse();
string newContent;
using (var sr = new StreamReader(response.GetResponseStream()))
{
newContent = sr.ReadToEnd();
}
tbOutput.Text += newContent + "\r\n";
if (_previousSource == string.Empty)
{
tbOutput.Text += "Nah. It's empty";
}
else if (_previousSource == newContent)
{
tbOutput.Text += "Nah. Equals the old content";
}
else
{
tbOutput.Text += "Oh great. Something happened";
}
_previousSource = newContent;
}
}
}
I'm trying to make an C# windows form application, with an webbrowser.
I'm using the webkit browser: Link to the browser
The webbrowser did i put in an class file, so i can acces it through all the forms i'm going to use.
The code that's generate the webbrowser:
public static WebKit.WebKitBrowser mainBrowser = new WebKitBrowser();
I'm having this piece of code that give's some problems:
globalVars.mainBrowser.Navigate("http://www.somesite.com/");
while (globalVars.mainBrowser.IsBusy)
{
System.Threading.Thread.Sleep(500);
}
globalVars.mainBrowser.Document.GetElementById("user").TextContent = "User Name";
But it's not working. If i do an message box after the while, it shows up before it's possible to render the page...
So what is the best way to wait until the site is fully loaded?
UPDATE 1
In an standalone class file, am i making the webkit controll like this:
public static WebKit.WebKitBrowser mainBrowser = new WebKitBrowser();
And in an form, i've got now this code (thanks to Tearsdontfalls):
public void loginthen()
{
globalVars.mainBrowser.DocumentCompleted += mainBrowser_DocumentCompleted;
globalVars.mainBrowser.Navigate("http://www.somesite.com/");
}
void mainBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var send = sender as WebKit.WebKitBrowser;
if (send.Url == e.Url)
{
MessageBox.Show("Inloggen");
globalVars.mainBrowser.Document.GetElementById("user").TextContent = "User Name";
}
}
But no messagebox shows up. But if i use an local (on the same form) webkit browser, i'm getting te MessageBox. But then the user field isn't filled in.
Even an breakpoint in the documentCompleted event, isn't triggerd. So it looks like the event listner isn't working...
So why is it not working?
You can simply create an event listener on the Document Completed Event on your Webbrowser, or you can create it dynamically like that:
globalVars.mainbrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(mainbrowser_DocumentCompleted);
Where mainbrowser_DocumentCompleted is the name of the void where you can do sth like this(I used the names of your provided code):
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
var send = sender as WebKit.WebKitBrowser;
if (send.Url == e.Url) {
globalVars.mainBrowser.Document.GetElementById("user").TextContent = "User Name";
}
}
Adding the following piece of code let the events fire when the browser is in invisible mode.
using (Bitmap bmp = new Bitmap(webKitBrowser.Width, webKitBrowser.Height))
{
webKitBrowser.DrawToBitmap(
bmp,
new Rectangle(
webKitBrowser.Location.X,
webKitBrowser.Location.Y,
webKitBrowser.Width,
webKitBrowser.Height
)
);
}