I want to get html code from website. In Browser I usually can just click on ‘View Page Source’ in context menu or something similar. But how can I automatized it? I’ve tried it with WebBrowser class but sometimes it doesn’t work. I am not web developer so I don’t really know if my approach at least make sense. I think main problem is that I sometimes get html where not all code was executed. Hence it is uncompleted. I have problem with e.g. this site: http://www.sreality.cz/en/search/for-sale/praha
My code (I’ve tried to make it small but runnable on its own):
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Windows.Forms;
namespace WebBrowserForm
{
internal static class Program
{
[STAThread]
private static void Main()
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
for (int i = 0; i < 10; i++)
{
Form1 f = new Form1();
f.ShowDialog();
}
// Now I can check Form1.List and see that some html is final and some is not
}
}
public class Form1 : Form
{
public static List<string> List = new List<string>();
private const string Url = "http://www.sreality.cz/en/search/for-sale/praha";
private System.Windows.Forms.WebBrowser webBrowser1;
public Form1()
{
this.webBrowser1 = new System.Windows.Forms.WebBrowser();
this.SuspendLayout();
this.webBrowser1.Dock = System.Windows.Forms.DockStyle.Fill;
this.webBrowser1.Name = "webBrowser1";
this.webBrowser1.TabIndex = 0;
this.ResumeLayout(false);
Load += new EventHandler(Form1_Load);
this.webBrowser1.ObjectForScripting = new MyScript();
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.Navigate(Url);
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
// Final html for 99% of web pages, but unfortunately not for all
string tst = webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
webBrowser1.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
Application.DoEvents();
webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
Application.DoEvents();
}
}
[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
HtmlDocument doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
string renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
// here I sometimes get full html but sometimes the same as in webBrowser1_DocumentCompleted method
List.Add(renderedHtml);
((Form1)Application.OpenForms[0]).Close();
}
}
}
}
I would expect that in ‘webBrowser1_DocumentCompleted’ method I could get final html. It usually works, but with this site it doesn’t. So I’ve tried get html in my own code which should be executed in web site -> method ‘CallServerSideCode’. What is strange that sometimes I get final html (basically the same as if I do it manually via Browser) but sometimes not. I think the problem is caused because my script start before whole web site is rendered instead after. But I am not really sure since this kind of things are far from my comfort zone and I don’t really understand what I am doing. I’m just trying to apply something what I found on the internet.
So, does anyone knows what is wrong with the code? Or even more importantly how to easily get final html from the site?
Any help appreciated.
You should use WebClient class to download HTML page. No display control necessary.
You want method DownloadString
May be it will be helpful if you add calling of your external function to the end of the body and wrap it by Jquery "ondomready" function. I mean something like this:
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
// Final html for 99% of web pages, but unfortunately not for all
string tst = webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
webBrowser1.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
HtmlElement body = webBrowser1.Document.GetElementsByTagName("body")[0];
HtmlElement scriptEl = webBrowser1.Document.CreateElement("script");
IHTMLScriptElement element = (IHTMLScriptElement)scriptEl.DomElement;
element.text = "$(function() { window.external.CallServerSideCode(); });";
body.AppendChild(scriptEl);
}
}
[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
HtmlDocument doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
string renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
// here I sometimes get full html but sometimes the same as in webBrowser1_DocumentCompleted method
List.Add(renderedHtml);
((Form1)Application.OpenForms[0]).Close();
}
}
Related
I am trying to use cefshar browser in C# winforms and need to know how I know when page completely loaded and how I can get browser document and get html elements,
I just Initialize the browser and don't know what I should do next:
public Form1()
{
InitializeComponent();
Cef.Initialize(new CefSettings());
browser = new ChromiumWebBrowser("http://google.com");
BrowserContainer.Controls.Add(browser);
browser.Dock = DockStyle.Fill;
}
CefSharp has a LoadingStateChanged event with LoadingStateChangedArgs.
LoadingStateChangedArgs has a property called IsLoading which indicates if the page is still loading.
You should be able to subscribe to it like this:
browser.LoadingStateChanged += OnLoadingStateChanged;
The method would look like this:
private void OnLoadingStateChanged(object sender, LoadingStateChangedEventArgs args)
{
if (!args.IsLoading)
{
// Page has finished loading, do whatever you want here
}
}
I believe you can get the page source like this:
string HTML = await browser.GetSourceAsync();
You'd probably need to get to grips with something like HtmlAgility to parse it, I'm not going to cover that as it's off topic.
I ended up using:
using CefSharp;
wbAuthorization.AddressChanged += OnAddressChanged;
and
private void OnAddressChanged(
object s,
AddressChangedEventArgs e)
{
if (e.Address.StartsWith(EndUri))
{
ResultUri = new Uri(e.Address);
this.DialogResult = DialogResult.OK;
}
}
EndUri is the final page I want to examine and ResultUri contains a string I want to extract later. Just some example code from a larger class.
Here's my code:
delegate void del(string data);
public partial class CreateUser : System.Web.UI.Page
{
static string rfidkey;
SerialPort serialPort1;
//Timer timer1;
del MyDlg;
private delegate void SetTextDeleg(string text);
public CreateUser()
{
serialPort1 = new SerialPort();
serialPort1.DataReceived += new SerialDataReceivedEventHandler(serialPort1_DataReceived);
//
timer1 = new Timer();
timer1.Interval = 100;
timer1.Enabled = false;
timer1.Tick += new EventHandler<EventArgs>(timer1_Tick);
//
MyDlg = new del(Display);
}
void Display(string s)
{
txtRFIDKey.Text = s;
}
void serialPort1_DataReceived(object sender, SerialDataReceivedEventArgs e)
{
Data.s = serialPort1.ReadExisting();
//txtRFIDKey.Text = Data.s;
MyDlg.BeginInvoke(Data.s, null, null);
}
}
This cannot be done the way you are coding it. Basically, you are trying to treat your web page like a Windows application. You need to understand the the "code behind", the code you have written above, runs on one computer, and the web page, where the text field is, is displayed on another. And once the page construction code finishes running, the server can't change the page. You can use postbacks and Ajax from the web page to call back to the server, but even that might not help you with the code you trying to write.
To get a better understanding of the fundamentals of asp.net web page processing, start here: ASP.NET page life cycle explanation
Or, you can just code your page as an Application using Windows Forms, or WPF.
I've searched up and down the internet throughout the day, and I'm just stumped.
What I want to do is play a youtube video inside of C# using the youtube API. Then I want a function on the form to be called when the video finishes playing. Unfortunately, I can't seem to find a way to get the events to fire.
(Using Visual C# 2010 Express, and have IE9. For reference.)
using System;
using System.Windows.Forms;
namespace WindowsFormsApplication1
{
using System.Runtime.InteropServices;
public partial class Form1 : Form
{
// This nested class must be ComVisible for the JavaScript to be able to call it.
[ComVisible(true)]
public class ScriptManager
{
// Variable to store the form of type Form1.
private Form1 mForm;
// Constructor.
public ScriptManager(Form1 form)
{
// Save the form so it can be referenced later.
mForm = form;
}
// This method can be called from JavaScript.
public void MethodToCallFromScript()
{
// Call a method on the form.
mForm.GoToNext();
}
}
public Form1()
{
InitializeComponent();
}
public void GoToNext()
{
MessageBox.Show("Play the next song");
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.Navigate("http://localhost/index.html");
}
}
}
That is my Form1.cs code. Form1.cs [Design] consists of nothing more than a webBrowser control.
I've tried numerous things to get this to work, from installing an http server to run the html 'live' to running it from a file directly off my computer, to setting the document text with the code as a string. All has failed me thus far. In IE9 if I open the index.html file locally (as a file and not through my webserver) the events do not fire. If I run it live off my webserver the events do fire. However in C# webBrowser control, these events do not seem to fire at all, no matter where it's run from.
<!DOCTYPE html>
<html>
<body>
<div id="player"></div>
<script>
var tag = document.createElement('script');
tag.src = "http://www.youtube.com/player_api";
var firstScriptTag = document.getElementsByTagName('script')[0];
firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
var player;
function onYouTubePlayerAPIReady() {
player = new YT.Player('player', {
height: '390',
width: '640',
playerVars: { 'autoplay': 1, 'controls': 1,'autohide':1,'wmode':'opaque' },
videoId: 'G4cRrOcDXXY',
events: {
'onReady': onPlayerReady,
'onStateChange': onPlayerStateChange
}
});
}
function onPlayerReady(event) {
event.target.mute();
}
function onPlayerStateChange(event) {
if(event.data === 0) {
alert('done');
window.external.MethodToCallFromScript();
}
}
</script>
</body>
</html>
I'm out of ideas, so any help would be greatly appreciated. I'd love to get events to fire in the C# WebBrowser control.
Wow. I was in the process of typing out my newest problem when I attempted something else with great success. The problem was that I'd navigate to my .html file on my webserver, and it'd begin playing a video, when the video finished, I'd have javascript tell C# to navigate to the same URL with a different youtube ID (to play another video). The second video would fail to fire the events.
I've overcome this by using different javascript, such as what was mentioned here.
I did start using Visual Studio 2013 Express, and IE11. That cleared up quite a few problems I was bumping into on it's own. I'll provide you guys with my current code, just in case anyone ever runs into the issues I've been dealing with.
My form:
using System;
using System.Windows.Forms;
using System.Data.SQLite;
namespace WindowsFormsApplication1
{
using System.Runtime.InteropServices;
public partial class Form1 : Form
{
// This nested class must be ComVisible for the JavaScript to be able to call it.
[ComVisible(true)]
public class ScriptManager
{
// Variable to store the form of type Form1.
private Form1 mForm;
// Constructor.
public ScriptManager(Form1 form)
{
// Save the form so it can be referenced later.
mForm = form;
}
public void AnotherMethod(string message)
{
mForm.GoToNext();
}
}
public Form1()
{
InitializeComponent();
}
public void GoToNext()
{
timer1.Interval = 2000;
timer1.Enabled = true;
}
public object MyInvokeScript(string name, params object[] args)
{
return webBrowser1.Document.InvokeScript(name, args);
}
public void SongCheck()
{
// Disable timer. Enable it later if there isn't a song to play.
if (timer1.Enabled)
timer1.Enabled = false;
// Connect to my SQLite db,
SQLiteConnection mySQLite = new SQLiteConnection("Data Source=ytsongrequest.s3db;Version=3;");
mySQLite.Open();
// The SQLite DB consists of three columns. id, youtubeid, requestor
// the 'id' auto increments when a row is added into the database.
string sqlCommand = "select * from songs order by id asc limit 1";
SQLiteCommand x = new SQLiteCommand(sqlCommand, mySQLite);
SQLiteDataReader reader = x.ExecuteReader();
if (reader.HasRows) {
while (reader.Read())
{
// Use our custom object to call a javascript function on our webpage.
object o = MyInvokeScript("createPlayerAndPlayVideo", reader["youtubeid"]);
label2.Text = reader["requestor"].ToString();
// Since we've played the song, we can now remove it.
x = new SQLiteCommand("delete from songs where id = " + reader["id"], mySQLite);
x.ExecuteNonQuery();
}
mySQLite.Close();
}
else
{
// Set a timer to check for a new song every 10 seconds.
timer1.Interval = 10000;
timer1.Enabled = true;
}
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.ObjectForScripting = new ScriptManager(this);
webBrowser1.Navigate("http://localhost/testing.html");
GoToNext();
}
private void timer1_Tick(object sender, EventArgs e)
{
SongCheck();
}
}
}
My HTML page that I have on my server:
<!DOCTYPE html>
<html>
<body>
<div id="player"></div>
<script>
var tag = document.createElement('script');
tag.src = "https://www.youtube.com/iframe_api";
var firstScriptTag = document.getElementsByTagName('script')[0];
firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);
var query = getQueryParams(document.location.search);
var player;
var playerAPIReady;
function onYouTubePlayerAPIReady() {
playerAPIReady = true;
}
function onPlayerReady() {
player.playVideo();
player.addEventListener('onStateChange', function(e) {
if (e.data === 0) {
window.external.AnotherMethod('Finished video');
}
});
}
function getQueryParams(qs) {
qs = qs.split("+").join(" ");
var params = {}, tokens,
re = /[?&]?([^=]+)=([^&]*)/g;
while (tokens = re.exec(qs)) {
params[decodeURIComponent(tokens[1])]
= decodeURIComponent(tokens[2]);
}
return params;
}
function createPlayerAndPlayVideo(id) {
if(! playerAPIReady) {
// player API file not loaded
return;
}
if (! player) {
player = new YT.Player('player', {
height: '390',
width: '640',
videoId: id,
events: {
'onReady': onPlayerReady
}
});
}
else {
player.loadVideoById(id);
}
}
</script>
</body>
</html>
This is what I have so far. I am trying to just read the XML from the URL and just get for example temperature, humidity....etc.... But every time I try something else it gives me an error. I want to retrieve the information and put it in a label.
namespace WindowsFormsApplication1 {
public partial class Form1: Form {
public Form1() {
InitializeComponent();
}
private void btnSubmit_Click(object sender, EventArgs e) {
String zip = txtZip.Text;
XmlDocument weatherURL = new XmlDocument();
weatherURL.Load("http://api.wunderground.com/api/"
your_key "/conditions/q/" + zip + ".xml");
foreach(XmlNode nodeselect in weatherURL.SelectNodes("response/current_observation"));
}
}
}
It took me a bit of trial and error but I've got it. In C# make sure you are using - using System.Xml;
Here is the code using wunderground API. In order for this to work make sure you sign up for a key other wise it will not work. Where is say this your_key that is where you put in your key. It should look like something like this. I used a button and three labels to display the information.
namespace wfats2
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
XmlDocument doc1 = new XmlDocument();
doc1.Load("http://api.wunderground.com/api/your_key/conditions/q/92135.xml");
XmlElement root = doc1.DocumentElement;
XmlNodeList nodes = root.SelectNodes("/response/current_observation");
foreach (XmlNode node in nodes)
{
string tempf = node["temp_f"].InnerText;
string tempc = node["temp_c"].InnerText;
string feels = node["feelslike_f"].InnerText;
label2.Text = tempf;
label4.Text = tempc;
label6.Text = feels;
}
}
}
}
When you press the button you will get the information displayed in the labels assign. I am still experimenting and you are able to have some sort of refresh every so often instead of pressing the button every time to get an update.
First off yeah you need to give more information in your question but off hand I can see that you have "your_key" inside of your URL. You are probably needing to replace that with your API key for this to work.
I am attempting to access the HTML of a page after it has been modified by the JavaScripts on the page. This is what I have been currently attempting based on what I have found online.
using System;
using System.Windows.Forms;
using System.IO;
namespace WebBrowserDemo
{
class Program
{
public const string TestUrl = #"http://www.theverge.com/2012/7/2/3126604/android-jelly-bean-updates-htc-samsung-google-pdk";
[STAThread]
static void Main(string[] args)
{
WebBrowser wb = new WebBrowser();
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
wb.Navigate(TestUrl);
while (wb.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
Console.WriteLine("\nPress any key to continue...");
Console.ReadKey(true);
}
static void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = (WebBrowser)sender;
HtmlElement document = wb.Document.GetElementsByTagName("html")[0];
using (StreamWriter sw = new StreamWriter("OuterHTML.txt"))
{
sw.WriteLine(document.OuterHtml);
}
var abc = wb.Document.InvokeScript("eval", new object[] { "window.scrollTo(0, document.body.scrollHeight);" });
Console.WriteLine();
document = wb.Document.GetElementsByTagName("html")[0];
using (StreamWriter sw = new StreamWriter("OuterHTML2.txt"))
{
sw.WriteLine(document.OuterHtml);
}
}
}
}
The ultimate goal is to scroll to the bottom of the page activating any JS to load the comments on the article. Though currently the html I get back from before and after the script is ran is the same.
Any Suggestions?
Thanks
You should do it with a WebBrowser control.
This is basically a componentized version of IE. Load the page into the control. You probably do not even need to display the page. You can register an event handler that will be called when the page is fully loaded. There is no definite way to determine when the scripts have "completed" - scripts are open-ended and may run as long as they like. So you'd have to build in a heuristic "Wait period", then examine the HTML after that wait period passes.
Incidentally this is exactly what IECapt does.