so I am working in a closed Framework that uses C#.
I have multiple HTML Documents which I need to print (formatted) in a row. I have tried using the WebBrowser Object (System.Windows.Forms), which works fine for me if I only print one HTML Document. The goal is to use the Preview or Print Dialog exaclty once at the start and then use those settings for the rest of the HTML Docs. I was unable to find any suitable solution to this problem without using external libraries.
I have tried to concatenate the HTML Docs into the Browser.DocumentText with a loop. This was working, but I was unable to get the Page-Break working correctly.
Import System.Windows.Forms
Import System.IO
Import System.Text
Import System.Collections.Generic
Public Class PrintHTML{
private WebBrowser printer;
private List<string> htmlDocs;
private void PrintDocument(object sender, WebBrowserDocumentCompletedEventArgs e)
{
printer.ShowPageSetupDialog();
printer.ShowPrintDialog();
}
private void PrintPreview(object sender, WebBrowserDocumentCompletedEventArgs e)
{
printer.ShowPageSetupDialog();
printer.ShowPrintPreviewDialog();
}
private void DoPrint(Boolean preview)
{
printer = new WebBrowser();
htmlDocs = new List<string>();
string printText = "";
// add htmlDocs
If (preview){
printer.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(PrintDocument);
}else{
printer.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(PrintPreview);
}
foreach (string html in htmlDocs)
{
printText += html;
}
printer.DocumentText = printText;
}
{
I have also tried to add a media-pagebreak CSS-Property into the htmls:
#media print {
h6 {
page-break-after: always
}
}
Which seems to be working when I try to print the HTML with Google Chrome or IE but is not working when I print it through the WebBrowser object.
What I am seeking is a generic "Page-Break" so that I can concatenate all my HTMLs, or a solution where I can use the Print Dialog ONCE and then silent print the rest of the HTMLs with the Dialog-Settings.
Thanks in advance!
Related
I am trying to use cefshar browser in C# winforms and need to know how I know when page completely loaded and how I can get browser document and get html elements,
I just Initialize the browser and don't know what I should do next:
public Form1()
{
InitializeComponent();
Cef.Initialize(new CefSettings());
browser = new ChromiumWebBrowser("http://google.com");
BrowserContainer.Controls.Add(browser);
browser.Dock = DockStyle.Fill;
}
CefSharp has a LoadingStateChanged event with LoadingStateChangedArgs.
LoadingStateChangedArgs has a property called IsLoading which indicates if the page is still loading.
You should be able to subscribe to it like this:
browser.LoadingStateChanged += OnLoadingStateChanged;
The method would look like this:
private void OnLoadingStateChanged(object sender, LoadingStateChangedEventArgs args)
{
if (!args.IsLoading)
{
// Page has finished loading, do whatever you want here
}
}
I believe you can get the page source like this:
string HTML = await browser.GetSourceAsync();
You'd probably need to get to grips with something like HtmlAgility to parse it, I'm not going to cover that as it's off topic.
I ended up using:
using CefSharp;
wbAuthorization.AddressChanged += OnAddressChanged;
and
private void OnAddressChanged(
object s,
AddressChangedEventArgs e)
{
if (e.Address.StartsWith(EndUri))
{
ResultUri = new Uri(e.Address);
this.DialogResult = DialogResult.OK;
}
}
EndUri is the final page I want to examine and ResultUri contains a string I want to extract later. Just some example code from a larger class.
I have a list, which contains paths to html files on my PC. I would like to loop through this list and print them all, in the same order they are in the list.
I tried to loop the code that i have found on msdn.microsoft.com for printing an HTML file.
List<string> AllHTMLsToPrint = new List<string>();
//things added to AllHTMLsToPrint list
foreach (string strHTMLToPrint in AllHTMLsToPrint)
{
PrintHelpPage(strHTMLToPrint);
}
private void PrintHelpPage(string strHTMLToPrint)
{
// Create a WebBrowser instance.
WebBrowser webBrowserForPrinting = new WebBrowser();
// Add an event handler that prints the document after it loads.
webBrowserForPrinting.DocumentCompleted +=
new WebBrowserDocumentCompletedEventHandler(PrintDocument);
// Set the Url property to load the document.
webBrowserForPrinting.Url = new Uri(strHTMLToPrint);
Thread.Sleep(100);
}
private void PrintDocument(object sender, WebBrowserDocumentCompletedEventArgs e)
{
// Print the document now that it is fully loaded.
((WebBrowser)sender).Print();
// Dispose the WebBrowser now that the task is complete.
((WebBrowser)sender).Dispose();
}
You have a design problem here. You walk your list of html pages to print. Then you open the page in a browser. When the page is loaded you print it.
BUT...
Loading the page may take longer than 100ms. This is the time after which the browser loads the next page. You should change your code so that the next page will load after the current one has been printed. You may not want to use a loop in this case but an index which you may want to increment after printing.
Should look similar to this (not tested):
List<string> AllHTMLsToPrint = new List<string>();
private int index = 0;
PrintHelpPage(AllHTMLsToPrint[index]);
private void PrintHelpPage(string strHTMLToPrint)
{
// Create a WebBrowser instance.
WebBrowser webBrowserForPrinting = new WebBrowser();
// Add an event handler that prints the document after it loads.
webBrowserForPrinting.DocumentCompleted +=
new WebBrowserDocumentCompletedEventHandler(PrintDocument);
// Set the Url property to load the document.
webBrowserForPrinting.Url = new Uri(strHTMLToPrint);
}
private void PrintDocument(object sender, WebBrowserDocumentCompletedEventArgs e)
{
// Print the document now that it is fully loaded.
((WebBrowser)sender).Print();
if (index < AllHTMLsToPrint.Count -1)
PrintHelpPage(AllHTMLsToPrint[++index]);
}
You've stated that you have a bunch of local html files.
The loading of local html files may not work by setting the URI.
You could try setting the DocumentStream instead. strHTMLToPrint must then contain the full/relative path to your local html file.
webBrowserForPrinting.DocumentStream = File.OpenRead(strHTMLToPrint);
Not sure what the exact issue is, but I would put this into a background worker so you don't hold up the main thread. I'd also move the loop into the document loaded system, that way as soon as it has loaded and printed it will move onto the next.
That said you haven't said what your code isn't doing.
public partial class Form1 : Form
{
internal List<string> AllHTMLsToPrint = new List<string>();
public Form1()
{
InitializeComponent();
}
public void StartPrinting()
{
//things added to AllHTMLsToPrint list, please note you may need to add file:/// to the URI list if it is a local file, unless it is compact framework
// start printing the first item
BackgroundWorker bgw = new BackgroundWorker();
bgw.DoWork += bgw_DoWork;
bgw.RunWorkerAsync();
/*foreach (string strHTMLToPrint in AllHTMLsToPrint)
{
PrintHelpPage(strHTMLToPrint);
}*/
}
void bgw_DoWork(object sender, DoWorkEventArgs e)
{
PrintHelpPage(AllHTMLsToPrint[0], (BackgroundWorker)sender);
}
private void PrintHelpPage(string strHTMLToPrint, BackgroundWorker bgw)
{
// Create a WebBrowser instance.
WebBrowser webBrowserForPrinting = new WebBrowser();
// Add an event handler that prints the document after it loads.
webBrowserForPrinting.DocumentCompleted += (s, ev) => {
webBrowserForPrinting.Print();
webBrowserForPrinting.Dispose();
// you can add progress reporting here
// remove the first element and see if we have to do it all again
AllHTMLsToPrint.RemoveAt(0);
if (AllHTMLsToPrint.Count > 0)
PrintHelpPage(AllHTMLsToPrint[0], bgw);
};
// Set the Url property to load the document.
webBrowserForPrinting.Url = new Uri(strHTMLToPrint);
}
}
I want to get html code from website. In Browser I usually can just click on ‘View Page Source’ in context menu or something similar. But how can I automatized it? I’ve tried it with WebBrowser class but sometimes it doesn’t work. I am not web developer so I don’t really know if my approach at least make sense. I think main problem is that I sometimes get html where not all code was executed. Hence it is uncompleted. I have problem with e.g. this site: http://www.sreality.cz/en/search/for-sale/praha
My code (I’ve tried to make it small but runnable on its own):
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Windows.Forms;
namespace WebBrowserForm
{
internal static class Program
{
[STAThread]
private static void Main()
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
for (int i = 0; i < 10; i++)
{
Form1 f = new Form1();
f.ShowDialog();
}
// Now I can check Form1.List and see that some html is final and some is not
}
}
public class Form1 : Form
{
public static List<string> List = new List<string>();
private const string Url = "http://www.sreality.cz/en/search/for-sale/praha";
private System.Windows.Forms.WebBrowser webBrowser1;
public Form1()
{
this.webBrowser1 = new System.Windows.Forms.WebBrowser();
this.SuspendLayout();
this.webBrowser1.Dock = System.Windows.Forms.DockStyle.Fill;
this.webBrowser1.Name = "webBrowser1";
this.webBrowser1.TabIndex = 0;
this.ResumeLayout(false);
Load += new EventHandler(Form1_Load);
this.webBrowser1.ObjectForScripting = new MyScript();
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.Navigate(Url);
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
// Final html for 99% of web pages, but unfortunately not for all
string tst = webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
webBrowser1.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
Application.DoEvents();
webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
Application.DoEvents();
}
}
[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
HtmlDocument doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
string renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
// here I sometimes get full html but sometimes the same as in webBrowser1_DocumentCompleted method
List.Add(renderedHtml);
((Form1)Application.OpenForms[0]).Close();
}
}
}
}
I would expect that in ‘webBrowser1_DocumentCompleted’ method I could get final html. It usually works, but with this site it doesn’t. So I’ve tried get html in my own code which should be executed in web site -> method ‘CallServerSideCode’. What is strange that sometimes I get final html (basically the same as if I do it manually via Browser) but sometimes not. I think the problem is caused because my script start before whole web site is rendered instead after. But I am not really sure since this kind of things are far from my comfort zone and I don’t really understand what I am doing. I’m just trying to apply something what I found on the internet.
So, does anyone knows what is wrong with the code? Or even more importantly how to easily get final html from the site?
Any help appreciated.
You should use WebClient class to download HTML page. No display control necessary.
You want method DownloadString
May be it will be helpful if you add calling of your external function to the end of the body and wrap it by Jquery "ondomready" function. I mean something like this:
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
// Final html for 99% of web pages, but unfortunately not for all
string tst = webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
webBrowser1.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
HtmlElement body = webBrowser1.Document.GetElementsByTagName("body")[0];
HtmlElement scriptEl = webBrowser1.Document.CreateElement("script");
IHTMLScriptElement element = (IHTMLScriptElement)scriptEl.DomElement;
element.text = "$(function() { window.external.CallServerSideCode(); });";
body.AppendChild(scriptEl);
}
}
[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
HtmlDocument doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
string renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
// here I sometimes get full html but sometimes the same as in webBrowser1_DocumentCompleted method
List.Add(renderedHtml);
((Form1)Application.OpenForms[0]).Close();
}
}
I'm trying to make an C# windows form application, with an webbrowser.
I'm using the webkit browser: Link to the browser
The webbrowser did i put in an class file, so i can acces it through all the forms i'm going to use.
The code that's generate the webbrowser:
public static WebKit.WebKitBrowser mainBrowser = new WebKitBrowser();
I'm having this piece of code that give's some problems:
globalVars.mainBrowser.Navigate("http://www.somesite.com/");
while (globalVars.mainBrowser.IsBusy)
{
System.Threading.Thread.Sleep(500);
}
globalVars.mainBrowser.Document.GetElementById("user").TextContent = "User Name";
But it's not working. If i do an message box after the while, it shows up before it's possible to render the page...
So what is the best way to wait until the site is fully loaded?
UPDATE 1
In an standalone class file, am i making the webkit controll like this:
public static WebKit.WebKitBrowser mainBrowser = new WebKitBrowser();
And in an form, i've got now this code (thanks to Tearsdontfalls):
public void loginthen()
{
globalVars.mainBrowser.DocumentCompleted += mainBrowser_DocumentCompleted;
globalVars.mainBrowser.Navigate("http://www.somesite.com/");
}
void mainBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var send = sender as WebKit.WebKitBrowser;
if (send.Url == e.Url)
{
MessageBox.Show("Inloggen");
globalVars.mainBrowser.Document.GetElementById("user").TextContent = "User Name";
}
}
But no messagebox shows up. But if i use an local (on the same form) webkit browser, i'm getting te MessageBox. But then the user field isn't filled in.
Even an breakpoint in the documentCompleted event, isn't triggerd. So it looks like the event listner isn't working...
So why is it not working?
You can simply create an event listener on the Document Completed Event on your Webbrowser, or you can create it dynamically like that:
globalVars.mainbrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(mainbrowser_DocumentCompleted);
Where mainbrowser_DocumentCompleted is the name of the void where you can do sth like this(I used the names of your provided code):
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
var send = sender as WebKit.WebKitBrowser;
if (send.Url == e.Url) {
globalVars.mainBrowser.Document.GetElementById("user").TextContent = "User Name";
}
}
Adding the following piece of code let the events fire when the browser is in invisible mode.
using (Bitmap bmp = new Bitmap(webKitBrowser.Width, webKitBrowser.Height))
{
webKitBrowser.DrawToBitmap(
bmp,
new Rectangle(
webKitBrowser.Location.X,
webKitBrowser.Location.Y,
webKitBrowser.Width,
webKitBrowser.Height
)
);
}
I am attempting to access the HTML of a page after it has been modified by the JavaScripts on the page. This is what I have been currently attempting based on what I have found online.
using System;
using System.Windows.Forms;
using System.IO;
namespace WebBrowserDemo
{
class Program
{
public const string TestUrl = #"http://www.theverge.com/2012/7/2/3126604/android-jelly-bean-updates-htc-samsung-google-pdk";
[STAThread]
static void Main(string[] args)
{
WebBrowser wb = new WebBrowser();
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
wb.Navigate(TestUrl);
while (wb.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
Console.WriteLine("\nPress any key to continue...");
Console.ReadKey(true);
}
static void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = (WebBrowser)sender;
HtmlElement document = wb.Document.GetElementsByTagName("html")[0];
using (StreamWriter sw = new StreamWriter("OuterHTML.txt"))
{
sw.WriteLine(document.OuterHtml);
}
var abc = wb.Document.InvokeScript("eval", new object[] { "window.scrollTo(0, document.body.scrollHeight);" });
Console.WriteLine();
document = wb.Document.GetElementsByTagName("html")[0];
using (StreamWriter sw = new StreamWriter("OuterHTML2.txt"))
{
sw.WriteLine(document.OuterHtml);
}
}
}
}
The ultimate goal is to scroll to the bottom of the page activating any JS to load the comments on the article. Though currently the html I get back from before and after the script is ran is the same.
Any Suggestions?
Thanks
You should do it with a WebBrowser control.
This is basically a componentized version of IE. Load the page into the control. You probably do not even need to display the page. You can register an event handler that will be called when the page is fully loaded. There is no definite way to determine when the scripts have "completed" - scripts are open-ended and may run as long as they like. So you'd have to build in a heuristic "Wait period", then examine the HTML after that wait period passes.
Incidentally this is exactly what IECapt does.