C# using webbrowser documenttext, document stays null - c#

I hope the title was clear enough, but I will try to explain...
I'm using C# Winforms ( dotnet 4.5 ).
The thing is that I'm creating a WebBrowser control and try to set the content with wb.DocumentText. But when I try to loop through the elements, it says that the document is empty (null)
Here's my code:
WebBrowser wb = new WebBrowser();
wb.DocumentText = leMessage;
HtmlElementCollection elems = wb.Document.GetElementsByTagName("a");
foreach (HtmlElement elem in elems)
{
// Do Some Stuff
}
leMessage holds an HTML newsletter message and there are some a tags in it.
I've already tried this: wb.Document.Body.InnerHtml = leMessage; but that didn't work either...
What did I miss or do wrong?

WebBrowser.DocumentText is asynchronous. You need to handle DocumentComplete before you can access the DOM, and keep pumping Windows messages. Here's a complete example of web-scraping, using async/await to keep the convinient linear code flow. Just alter the navigation part:
await NavigateAsync(ct, () => this.webBrowser.DocumentText = leMessage), timeout);
HtmlElementCollection elems = wb.Document.GetElementsByTagName("a");
This way you could do it in a loop. In a nutshell:
using System;
using System.Diagnostics;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace WinformsApp2
{
public partial class MainForm : Form
{
public MainForm()
{
InitializeComponent();
}
const string leMessage = "<a href='http://example.com'>Go there</a>";
private async void MainForm_Load(object sender, EventArgs e)
{
var wb = new WebBrowser();
TaskCompletionSource<bool> tcs = null;
WebBrowserDocumentCompletedEventHandler documentCompletedHandler = (sender2, e2) => tcs.TrySetResult(true);
for (int i = 0; i < 3; i++)
{
tcs = new TaskCompletionSource<bool>();
wb.DocumentCompleted += documentCompletedHandler;
try {
wb.DocumentText = leMessage;
await tcs.Task;
}
finally {
wb.DocumentCompleted -= documentCompletedHandler;
}
HtmlElementCollection elems = wb.Document.GetElementsByTagName("a");
foreach (HtmlElement elem in elems)
{
Debug.Print(elem.OuterHtml);
}
}
}
}
}

You need to loop elements after event webBrowser1_DocumentCompleted is triggered .Therefore you need to have it in your code
webBrowser1.DocumentCompleted+=new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
private void webBrowser1_DocumentCompleted(object sender,WebBrowserDocumentCompletedEventArgs e)
{
//here you can to loop your elements
}

Try this:
WebBrowser wb;
private void Form1_Load(object sender, EventArgs e)
{
wb = new WebBrowser();
wb.DocumentCompleted += wb_DocumentCompleted;
wb.DocumentText = "<html><body><a href='#'>Test</a></body></html>";
}
void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection elems = ((WebBrowser)sender)
.Document.GetElementsByTagName("a");
foreach (HtmlElement elem in elems)
{
// Do Some Stuff
}
}

Related

System.Windows.Forms.WebBrowser wait until page has been fully loaded

I have been trying a lot of different solutions with wait and async. Nothing seems to work. I was not able to find solution that actually fully waits until page has been fully loaded. All codes are waiting some time but not until page has been loaded and I am getting an error on next process.
How I can set for example code into wait mode until Document.GetElementById("quickFind_text_0") element has been found on page?
Here is my code:
private void button7_Click(object sender, EventArgs e)
{
webBrowser1.Navigate("https://company.crm4.dynamics.com/main.aspx?app=d365default&pagetype=entitylist&etn=opportunity");
webBrowser1.Document.GetElementById("shell-container").Document.GetElementById("quickFind_text_0").SetAttribute("value", "Airbus");
webBrowser1.Document.GetElementById("shell-container").Document.GetElementById("quickFind_text_0").InnerText = "Airbus";
//Thread.Sleep(2000);
HtmlElement fbLink = webBrowser1.Document.GetElementById("shell-container").Document.GetElementById("mainContent").Document.GetElementById("quickFind_button_0"); ;
fbLink.InvokeMember("click");
}
P.S. I have to do this "twice" otherwise it is not working:
webBrowser1.Document.GetElementById("shell-container").Document.GetElementById("quickFind_text_0").SetAttribute("value", "Airbus");
webBrowser1.Document.GetElementById("shell-container").Document.GetElementById("quickFind_text_0").InnerText = "Airbus";
In VBA this works:
While .Busy
DoEvents
Wend
While .ReadyState <> 4
DoEvents
Wend
Is it possible to do the same in C#?
EDIT:
My full code below. For some reason async/await does not work.
System.NullReferenceException HResult=0x80004003 Message=Object
reference not set to an instance of an object. Source=v.0.0.01
StackTrace: at v._0._0._01.Browser.<button7_Click>d__7.MoveNext()
in C:\Users\PC\source\repos\v.0.0.01\v.0.0.01\Browser.cs:line 69
Here is my code:
using System;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace v.0._0._01
{
public static class WebBrowserExtensions
{
public static Task<Uri> DocumentCompletedAsync(this WebBrowser wb)
{
var tcs = new TaskCompletionSource<Uri>();
WebBrowserDocumentCompletedEventHandler handler = null;
handler = (_, e) =>
{
wb.DocumentCompleted -= handler;
tcs.TrySetResult(e.Url);
};
wb.DocumentCompleted += handler;
return tcs.Task;
}
}
public partial class Browser : Form
{
public Browser()
{
InitializeComponent();
}
private async void button7_Click(object sender, EventArgs e)
{
webBrowser1.Navigate("https://company.crm4.dynamics.com/main.aspx?app=d365default&pagetype=entitylist&etn=opportunity");
await webBrowser1.DocumentCompletedAsync(); // async magic
HtmlElement fbLink = webBrowser1.Document.GetElementById("shell-container").Document.GetElementById("mainContent").Document.GetElementById("quickFind_button_0"); ;
fbLink.InvokeMember("click");
}
}
}
Also now I have noticed that quickFind_text_0 and quickFind_button_0 always starts with same words but numbers are changing like quickFind_text_1 and quickFind_button_1 or quickFind_text_2 and quickFind_button_2. However by manual clicking everything works with quickFind_text_0 and quickFind_button_0.
Here is an extension method for easy awaiting of the DocumentCompleted event:
public static class WebBrowserExtensions
{
public static Task<Uri> DocumentCompletedAsync(this WebBrowser wb)
{
var tcs = new TaskCompletionSource<Uri>();
WebBrowserDocumentCompletedEventHandler handler = null;
handler = (_, e) =>
{
wb.DocumentCompleted -= handler;
tcs.TrySetResult(e.Url);
};
wb.DocumentCompleted += handler;
return tcs.Task;
}
}
It can be used like this:
private async void button1_Click(object sender, EventArgs e)
{
webBrowser1.Navigate("https://company.crm4.dynamics.com/main.aspx");
await webBrowser1.DocumentCompletedAsync(); // async magic
HtmlElement fbLink = webBrowser1.Document.GetElementById("quickFind_button_0");
fbLink.InvokeMember("click");
}
The lines after the await will run after the page has completed loading.
Update: Here is another extension method for awaiting a specific element to appear in the page:
public static async Task<HtmlElement> WaitForElementAsync(this WebBrowser wb,
string elementId, int timeout = 30000, int interval = 500)
{
var stopwatch = Stopwatch.StartNew();
while (true)
{
try
{
var element = wb.Document.GetElementById(elementId);
if (element != null) return element;
}
catch { }
if (stopwatch.ElapsedMilliseconds > timeout) throw new TimeoutException();
await Task.Delay(interval);
}
}
It can be used for example after invoking a click event that modifies the page using XMLHttpRequest:
someButton.InvokeMember("click");
var mainContentElement = await webBrowser1.WaitForElementAsync("mainContent", 5000);

How can I print a pdf document from Xamarin.Forms UWP?

I have a Xamarin.Forms application that supports only UWP. I cannot find a way to print a pdf document. Whatever I have seen on the web, for some reason doesn't work for me. E.g. I tried
https://www.syncfusion.com/kb/8767/how-to-print-pdf-documents-in-xamarin-forms-platform
It lets me print, but the preview in the print dialog never shows up, and the progress indicator just keeps rotating forever.
I also tried http://zawayasoft.com/2018/03/13/uwp-print-pdf-files-silently-without-print-dialog/
This gives me errors that I cannot fix.
So I wonder if somebody can suggest something else that would actually work. Maybe something newer than what I have tried (I use VS 2017). Printing without the printing dialog would be preferable.
Thank you in advance.
I used a very dirty hack to do that!
What I had to do was to try to print the image version of the pdf (I did the conversion in backend) and then used the following DependencyInjection:
Inside my Print class in UWP project:
class Print : IPrint
{
void IPrint.Print(byte[] content)
{
Print_UWP printing = new Print_UWP();
printing.PrintUWpAsync(content);
}
}
and the class responsible for printing in uwp:
public class Print_UWP
{
PrintManager printmgr = PrintManager.GetForCurrentView();
PrintDocument PrintDoc = null;
PrintDocument printDoc;
PrintTask Task = null;
Windows.UI.Xaml.Controls.Image ViewToPrint = new Windows.UI.Xaml.Controls.Image();
public Print_UWP()
{
printmgr.PrintTaskRequested += Printmgr_PrintTaskRequested;
}
public async void PrintUWpAsync(byte[] imageData)
{
int i = 0;
while (i < 5)
{
try
{
BitmapImage biSource = new BitmapImage();
using (InMemoryRandomAccessStream stream = new InMemoryRandomAccessStream())
{
await stream.WriteAsync(imageData.AsBuffer());
stream.Seek(0);
await biSource.SetSourceAsync(stream);
}
ViewToPrint.Source = biSource;
if (PrintDoc != null)
{
printDoc.GetPreviewPage -= PrintDoc_GetPreviewPage;
printDoc.Paginate -= PrintDoc_Paginate;
printDoc.AddPages -= PrintDoc_AddPages;
}
this.printDoc = new PrintDocument();
try
{
printDoc.GetPreviewPage += PrintDoc_GetPreviewPage;
printDoc.Paginate += PrintDoc_Paginate;
printDoc.AddPages += PrintDoc_AddPages;
bool showprint = await PrintManager.ShowPrintUIAsync();
}
catch (Exception e)
{
Debug.WriteLine(e.ToString());
}
// printmgr = null;
// printDoc = null;
// Task = null;
PrintDoc = null;
GC.Collect();
printmgr.PrintTaskRequested -= Printmgr_PrintTaskRequested;
break;
}
catch (Exception e)
{
i++;
}
}
}
private void Printmgr_PrintTaskRequested(PrintManager sender, PrintTaskRequestedEventArgs args)
{
var deff = args.Request.GetDeferral();
Task = args.Request.CreatePrintTask("Invoice", OnPrintTaskSourceRequested);
deff.Complete();
}
async void OnPrintTaskSourceRequested(PrintTaskSourceRequestedArgs args)
{
var def = args.GetDeferral();
await Windows.ApplicationModel.Core.CoreApplication.MainView.CoreWindow.Dispatcher.RunAsync(Windows.UI.Core.CoreDispatcherPriority.Normal, () =>
{
args.SetSource(printDoc.DocumentSource);
});
def.Complete();
}
private void PrintDoc_AddPages(object sender, AddPagesEventArgs e)
{
printDoc.AddPage(ViewToPrint);
printDoc.AddPagesComplete();
}
private void PrintDoc_Paginate(object sender, PaginateEventArgs e)
{
PrintTaskOptions opt = Task.Options;
printDoc.SetPreviewPageCount(1, PreviewPageCountType.Final);
}
private void PrintDoc_GetPreviewPage(object sender, GetPreviewPageEventArgs e)
{
printDoc.SetPreviewPage(e.PageNumber, ViewToPrint);
}
}
Please note that this is not a perfect solution and sometimes it crashes without actually being able to trace the exception (which is really strange) so I am sure there must be better answers even though it does the job.

Print Multiple Pages From a UWA

I have about 8 records that I want to print in one batch, each on a separate page. However, the UWP sample for this uses over 600 lines of code to accomplish it. It seems to me that it has to be much, much easier than that. I thought all we'd have to do is add each page to the PrintDocument and send the print job. Apparently not. I'm using this:
async void Print()
{
var printDocument = new PrintDocument();
var printDocumentSource = printDocument.DocumentSource;
var printMan = PrintManager.GetForCurrentView();
printMan.PrintTaskRequested += PrintTaskRequested;
var pages = new List<Page>();
foreach (var item in items)
{
(//Set up variables)
var printPage = new PageToPrint() { //Set properties };
printPage.Set_Up(); //Set up fields
pages.Add(printPage);
}
printDocument.SetPreviewPage(1, page);
printDocument.SetPreviewPageCount(pages.Count, PreviewPageCountType.Final);
foreach (var page in pages)
{
printDocument.AddPage(page);
}
printDocument.AddPagesComplete();
await PrintManager.ShowPrintUIAsync();
}
void PrintTaskRequested(PrintManager sender, PrintTaskRequestedEventArgs e)
{
PrintTask printTask = null;
printTask = e.Request.CreatePrintTask("Kimble Print Job", sourceRequested =>
{
printTask.Completed += PrintTask_Completed;
sourceRequested.SetSource(printDocumentSource);
});
}
private async void PrintTask_Completed(PrintTask sender, PrintTaskCompletedEventArgs args)
{
await Dispatcher.RunAsync(Windows.UI.Core.CoreDispatcherPriority.Normal, () =>
{
PrintManager printMan = PrintManager.GetForCurrentView();
printMan.PrintTaskRequested -= PrintTaskRequested;
});
}
However, it won't generate the print preview. It just sits there spinning and spinning, and if I hit "print" it doesn't succeed (PDF can't open, job never gets to a physical printer.)
I was hoping printing would be at least reasonably easy with the PrintDocument, and I still think it looks like it should be. Am I just missing it here, or does it really take 600+ lines of code to dispatch a simple print job?
However, it won't generate the print preview.
This is because the setPreview method printDocument.SetPreviewPage(1, page); must be put in printDocument.GetPreviewPageevent handle. So you should register the event handle firstly. Same with printDocument.AddPages event handle.You messed up the event handle register and callback function all in one.Here I do a little change of your code and I tested it works well.
protected PrintDocument printDocument;
protected IPrintDocumentSource printDocumentSource;
List<Page> pages = new List<Page>();
Page printPage = new PageToPrint();
public MainPage()
{
this.InitializeComponent();
RegisterForPrinting();
}
private async void BtnPrint_Click(object sender, RoutedEventArgs e)
{
await PrintManager.ShowPrintUIAsync();
}
public void RegisterForPrinting()
{
printDocument = new PrintDocument();
printDocumentSource = printDocument.DocumentSource;
pages.Add(printPage);
printDocument.GetPreviewPage += GetPrintPreviewPage;
printDocument.AddPages += AddPrintPages;
PrintManager printMan = PrintManager.GetForCurrentView();
printMan.PrintTaskRequested += PrintTaskRequested;
}
private void AddPrintPages(object sender, AddPagesEventArgs e)
{
foreach (var page in pages)
{
printDocument.AddPage(page);
}
printDocument.AddPagesComplete();
}
private void GetPrintPreviewPage(object sender, GetPreviewPageEventArgs e)
{
printDocument.SetPreviewPage(1, printPage);
printDocument.SetPreviewPageCount(pages.Count, PreviewPageCountType.Final);
}
void PrintTaskRequested(PrintManager sender, PrintTaskRequestedEventArgs e)
{
PrintTask printTask = null;
printTask = e.Request.CreatePrintTask("Kimble Print Job", sourceRequested =>
{
printTask.Completed += PrintTask_Completed;
sourceRequested.SetSource(printDocumentSource);
});
}
private async void PrintTask_Completed(PrintTask sender, PrintTaskCompletedEventArgs args)
{
await Dispatcher.RunAsync(Windows.UI.Core.CoreDispatcherPriority.Normal, () =>
{
PrintManager printMan = PrintManager.GetForCurrentView();
printMan.PrintTaskRequested -= PrintTaskRequested;
});
}
Although you may not need all the code of the sample, but I recommend you to follow the official sample structure and build a PrintHelper class.

C# WebBrowser Run Javascript - Return blank page with the result of the Javascript - Why?

when i try to run javascript in a webbrowser, the code runs, but after it has been executed takes me on a white page with the upper right corner a numeric value (in this case "1000"), taking me away from the site where I was previously
HtmlElement head = webBrowser1.Document.GetElementsByTagName("head")[0];
HtmlElement scriptEl = webBrowser1.Document.CreateElement("script");
IHTMLScriptElement element = (IHTMLScriptElement)scriptEl.DomElement;
element.text = "function ScrollDown() { document.getElementsByClassName('scrollableitemclass').scrollTop = 1000 }";
head.AppendChild(scriptEl);
webBrowser1.Document.InvokeScript("ScrollDown");
Thank you for the help
You can scroll to your Html Elemet with HtmlElement.ScrollIntoView.
see this example:
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.DocumentText = "<html><body><span class=\"cls\" id=\"el\"> </body></html>";
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//for element with id
webBrowser1.Document.GetElementById("el").ScrollIntoView(true);
//for element with spesific
foreach (HtmlElement el in webBrowser1.Document.All)
{
if (el.GetAttribute("ClassName") == "cls")
{
el.ScrollIntoView(true);
}
}
}
}

WebBrowser - empty DocumentText

I'm trying to use WebBrowser class, but of course it doesn't work.
My code:
WebBrowser browser = new WebBrowser();
browser.Navigate("http://www.google.com");
while(browser.DocumentText == "")
{
continue;
}
string html = browser.DocumentText;
browser.DocumentText is always "". Why?
You should use DocumentCompleted event, and if you don't have WebForms application, also ApplicationContext might be needed.
static class Program
{
[STAThread]
static void Main()
{
Context ctx = new Context();
Application.Run(ctx);
// ctx.Html; -- your html
}
}
class Context : ApplicationContext
{
public string Html { get; set; }
public Context()
{
WebBrowser browser = new WebBrowser();
browser.AllowNavigation = true;
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
browser.Navigate("http://www.google.com");
}
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
Html = ((WebBrowser)sender).DocumentText;
this.ExitThread();
}
}
The WebBrowser isn't going to do it's job until the current thread finishes it's work, if you changed it to be something like this:
WebBrowser browser = new WebBrowser();
browser.Navigate("http://www.google.com");
browser.Navigated += (s, e) =>
{
var html = browser.DocumentText;
};
The variable will be set.
But, as others have mentioned, the document completed is a better event to attach to, as at that time, the entire document will be completed (appropriate name!)
WebBrowser browser = new WebBrowser();
browser.Navigate("http://www.google.com");
browser.DocumentCompleted += (s, e) =>
{
var html = browser.DocumentText;
html.ToString();
};
Attach to the DocumentCompleted event, the code is as below
browser.DocumentCompleted += (s, e) =>
{
string html = browser.DocumentText;
};
If you need the DocumentText you should handle the DocumentCompleted event
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
See event below
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = (WebBrowser)sender;
string text = wb.DocumentText;
}
Try something like this
string url = string.Empty:
string html = "http://www.google.com/";
string url = html;
if (!url.StartsWith("http://") && !url.StartsWith("https://"))
{
url = "http://" + url;
}
browser.Navigate(new Uri(url));
replace it within your While loop where necessary

Categories