I'm using this code on Windows Desktop App to get the values of a combobox that I after need to select which is going to update the page with new information using JavaScript
private WebBrowser withEventsField_wb;
WebBrowser wb {
get { return withEventsField_wb; }
set {
if (withEventsField_wb != null) {
withEventsField_wb.Navigated -= navigated;
}
withEventsField_wb = value;
if (withEventsField_wb != null) {
withEventsField_wb.Navigated += navigated;
}
}
}
private void Form1_Load(object sender, EventArgs e)
{
wb = new WebBrowser();
wb.Navigate("https://academicos.ubi.pt/online/horarios.aspx?p=a");
}
private void navigated()
{
HtmlElementCollection allelements = wb.Document.All;
HtmlElement year = default(HtmlElement);
foreach (HtmlElement webpageelement in allelements) {
if (webpageelement.GetAttribute("id").Contains("ContentPlaceHolder1_ddlAnoLect") == true) {
year = webpageelement;
HtmlElementCollection yoptions = year.Children;
foreach (HtmlElement yopt in yoptions) {
ComboBox1.Items.Add(yopt.InnerText);
}
}
}
}
But now I'm trying to do the same on Universal App (Windows Phone/Windows) but I'm being unable to do the same. I know that I have to use HttpClient but it does not work like a WebBrowser, this web browser is only created by code to get all the data needed and as for each step of data that I need to retrieve the website does not refresh normally but uses jQuery to load the new information.
Any help?
Well after a lot of searching I got something that helps and even gave me other idea
http://blog.gauravchouhan.com/tag/advance-web-scraping-using-c/
Related
I have to change inner html code before showing it in the WebBrowser.
Test page - http://aksmod.ru/skajrim-mod-kukri-ot-aksyonov-v5-0/
I tried to use AngleSharp.Scripting but it doesn't work correctly (the ads doesn't load)
var config = new Configuration().WithDefaultLoader().WithJavaScript();
var document = BrowsingContext.New(config).OpenAsync(address).Result;
//do something
return document.DocumentElement.OuterHtml;
later I thought about LoadCompleted, but the result was the same
private void Wb_LoadCompleted(object sender, NavigationEventArgs e)
{
Console.WriteLine("Loaded");
string url = e.Uri.ToString();
if (!(url.StartsWith("http://") || url.StartsWith("https://")))
{ }
if (e.Uri.AbsolutePath != wb.Source.AbsolutePath)
{ }
else
{
Console.WriteLine("Full Loaded");
HTMLDocument html = (HTMLDocument)wb.Document;
var value = html.getElementsByTagName("html").item(index: 0);
//do something
wb.NavigateToString(value.OuterHtml);
}
}
the event just doesn't fire (it works fine for some other sites, although).
So, what I am missing to do it?
Update 1
MCVE
XAML
<Grid>
<WebBrowser Name="wb" />
</Grid>
Code behind
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
wb.Navigated += Wb_Navigated;
wb.LoadCompleted += Wb_LoadCompleted;
wb.Navigate("http://aksmod.ru/skajrim-mod-kukri-ot-aksyonov-v5-0/");
}
private void Wb_LoadCompleted(object sender, NavigationEventArgs e)
{
Console.WriteLine("Loaded");
string url = e.Uri.ToString();
if (!(url.StartsWith("http://") || url.StartsWith("https://")))
{ }
if (e.Uri.AbsolutePath != wb.Source.AbsolutePath)
{ }
else
{
Console.WriteLine("Full Loaded");
HTMLDocument html = (HTMLDocument)wb.Document;
var value = html.getElementsByTagName("html").item(index: 0);
//do something
wb.NavigateToString(value.OuterHtml);
}
}
private void Wb_Navigated(object sender, NavigationEventArgs e)
{
FieldInfo fiComWebBrowser = typeof(WebBrowser)
.GetField("_axIWebBrowser2",
BindingFlags.Instance | BindingFlags.NonPublic);
if (fiComWebBrowser == null) return;
object objComWebBrowser = fiComWebBrowser.GetValue(wb);
if (objComWebBrowser == null) return;
objComWebBrowser.GetType().InvokeMember(
"Silent", BindingFlags.SetProperty, null, objComWebBrowser,
new object[] { true });
Console.WriteLine("Navigated");
}
}
The ads are embedded as iFrame within the page you presented. In my case, the Ad URL loaded in the iFrame is something like https://cdn.254a.com/images/hosted/elv/retargeting/v5/728x90.html?... (check with web browser's inspector tool)
Probably the ad does not allow iframing in your page (Check what the ad returns in X-Frame-Options header field). If this is the issue, it should be possible to implement a proxy for the ad, and let the proxy change the X-Frame-Options header.
In this case, if the ad URL is https (and not just http), you'd need to create a proxy that acts as Man-in-the-Middle. See accepted answer of What's the point of the X-Frame-Options header?. But you could replace the URL by your proxy URL, with the original URL in the ARGS. the proxy acts as HTTPS client, gets the content, proxy is able to modify the header, and returns the content to your page just via HTTP.
You can use: http://html-agility-pack.net for manipulate the Html code on C#.
I have this small code on c# .NET which publish tweets and shows timeline of twitter using tweetinvi . And I'd like to autoupdate timeline whenever the tweet is sent. Can anyone advice how to do it with event? Thanks for answers.
private void button1click(object sender, EventArgs e)
{
if (richTextBox1.Text != "")
{
Tweet.PublishTweet(richTextBox1.Text);
MessageBox.Show("Your tweet was sent!", "Important Message");
}
else
{
MessageBox.Show("You need to write something!", "Important Message");
}
}
private void Timeline_GetHomeTimeline(object sender, EventArgs e)
{
var loggedUser = User.GetLoggedUser();
string x = "";
var homeTimelineTweets = loggedUser.GetHomeTimeline();
foreach (var tweet in homeTimelineTweets)
{
x += tweet.Text + Environment.NewLine;
}
richTextBox2.Text = x;
}
First of all please note that it is a very bad practice to make several calls User.GetLoggedUser();. The reason being that the endpoint is limited to 15 requests every 15 minutes (1 per minute).
If the user happens to publish more than 15 tweets in 15 minutes, your code will break.
Now you have multiple solutions to solve the problem, but the best one is the UserStream (solution 1).
Solution 1
I would suggest to add the following code in the Initialized event.
var us = Stream.CreateUserStream();
us.TweetCreatedByMe += (sender, args) =>
{
// Update your rich textbox by adding the new tweet with tweet.Text
var tweetPublishedByMe = args.Tweet;
// OR Get your timeline and rewrite the text entirely in your textbox
var userTimeline = Timeline.GetHomeTimeline();
if (userTimeline != null)
{
// foreach ...
}
};
us.StartStreamAsync();
Solution 2
If you do not need to reload your Timeline each time the user publishes a tweet but you do need the new tweet to be displayed use the following solution.
var tweet = Tweet.PublishTweet("hello");
if (tweet != null)
{
// Update your rich textbox
}
Solution 3
Update your timeline if a tweet has been published successfully.
var tweet = Tweet.PublishTweet("hello");
if (tweet != null)
{
var userTimeline = Timeline.GetHomeTimeline();
if (userTimeline != null)
{
// foreach ...
}
}
NOTE Please note that I have never had the need to retrieve the LoggedUser at any point. Most of the time a LoggedUser should be retrieved once and then used across your app.
Also please note that I am the main developer of Tweetinvi.
I want to get html code from website. In Browser I usually can just click on ‘View Page Source’ in context menu or something similar. But how can I automatized it? I’ve tried it with WebBrowser class but sometimes it doesn’t work. I am not web developer so I don’t really know if my approach at least make sense. I think main problem is that I sometimes get html where not all code was executed. Hence it is uncompleted. I have problem with e.g. this site: http://www.sreality.cz/en/search/for-sale/praha
My code (I’ve tried to make it small but runnable on its own):
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Windows.Forms;
namespace WebBrowserForm
{
internal static class Program
{
[STAThread]
private static void Main()
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
for (int i = 0; i < 10; i++)
{
Form1 f = new Form1();
f.ShowDialog();
}
// Now I can check Form1.List and see that some html is final and some is not
}
}
public class Form1 : Form
{
public static List<string> List = new List<string>();
private const string Url = "http://www.sreality.cz/en/search/for-sale/praha";
private System.Windows.Forms.WebBrowser webBrowser1;
public Form1()
{
this.webBrowser1 = new System.Windows.Forms.WebBrowser();
this.SuspendLayout();
this.webBrowser1.Dock = System.Windows.Forms.DockStyle.Fill;
this.webBrowser1.Name = "webBrowser1";
this.webBrowser1.TabIndex = 0;
this.ResumeLayout(false);
Load += new EventHandler(Form1_Load);
this.webBrowser1.ObjectForScripting = new MyScript();
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.Navigate(Url);
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
// Final html for 99% of web pages, but unfortunately not for all
string tst = webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
webBrowser1.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
Application.DoEvents();
webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
Application.DoEvents();
}
}
[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
HtmlDocument doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
string renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
// here I sometimes get full html but sometimes the same as in webBrowser1_DocumentCompleted method
List.Add(renderedHtml);
((Form1)Application.OpenForms[0]).Close();
}
}
}
}
I would expect that in ‘webBrowser1_DocumentCompleted’ method I could get final html. It usually works, but with this site it doesn’t. So I’ve tried get html in my own code which should be executed in web site -> method ‘CallServerSideCode’. What is strange that sometimes I get final html (basically the same as if I do it manually via Browser) but sometimes not. I think the problem is caused because my script start before whole web site is rendered instead after. But I am not really sure since this kind of things are far from my comfort zone and I don’t really understand what I am doing. I’m just trying to apply something what I found on the internet.
So, does anyone knows what is wrong with the code? Or even more importantly how to easily get final html from the site?
Any help appreciated.
You should use WebClient class to download HTML page. No display control necessary.
You want method DownloadString
May be it will be helpful if you add calling of your external function to the end of the body and wrap it by Jquery "ondomready" function. I mean something like this:
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
// Final html for 99% of web pages, but unfortunately not for all
string tst = webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
webBrowser1.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
HtmlElement body = webBrowser1.Document.GetElementsByTagName("body")[0];
HtmlElement scriptEl = webBrowser1.Document.CreateElement("script");
IHTMLScriptElement element = (IHTMLScriptElement)scriptEl.DomElement;
element.text = "$(function() { window.external.CallServerSideCode(); });";
body.AppendChild(scriptEl);
}
}
[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
HtmlDocument doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
string renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
// here I sometimes get full html but sometimes the same as in webBrowser1_DocumentCompleted method
List.Add(renderedHtml);
((Form1)Application.OpenForms[0]).Close();
}
}
I am experiencing a leak of some sort using webbrowser object; I am still surfing all over the place for answers -- I've seen some similar questions on this forum as well, but I cant see how to apply those findings in my case.
After a page loads the DocumentCompleted action fires and I parse the HTML on the page,
void PageScrollTimerTick(object sender, EventArgs e)
{
String pageSrc = webBrowser1.Document.Body.InnerHtml;
// Check if we need to stop scrolling..
if (m_iLastFramePageLength == pageSrc.Length)
{
m_iLastFramePageLength = 0;
m_scrollTimer.Tick -= PageScrollTimerTick
m_scrollTimer.Enabled = false;
parsePage();
nextPage();
}
else
{
m_iLastFramePageLength = pageSrc.Length;
webBrowser1.Document.Window.ScrollTo(0, webBrowser1.Document.Body.ScrollRectangle.Height);
}
}
The Leak:
As I type this, I wonder why these functions? I have 6 different functions that do very similar tasks. I think these have problems because they are executed from a TIMER which probably uses a different thread. I'm I close? How can I resolve this. Perhaps Invoke() on the web browser control?
doParse():
List<String> doSomeExtractions()
{
List<String> retVal = new List<String>();
foreach (HtmlElement div in webBrowser1.Document.GetElementsByTagName("div"))
{
String szClassName = div.GetAttribute("classname");
switch (szClassName)
{
case "someDivClass":
{
if (div.InnerHtml.Contains("<b>"))
{
retVal.Add(div.InnerHtml);
}
break;
}
default:
{
break;
}
}
}
return retVal;
}
moveNext():
// Store data, navigate to next page.
webBrowser1.DocumentCompleted += this.scrapeData;
webBrowser1.Navigate("about:blank");
I'm developing a Windows Phone app that uses the older WP7 Microsoft.Phone.Controls.Maps.Map / Bing Map control.
The map tiles are being served up from a local source so the app doesn't not need a network connection to work. Unfortunately the map control insists on showing an "Unable to contact Server. Please try again later." message over the map when offline.
Does anyone know of a method to remove / hide this message?
Just in case you're curious - I'm developing a WP8 app but using the depreciated WP7 Bing map control as the new WP8 map control provides no method for replacing the Bing base map.
i think this may suits you better:
void YourPage_Loaded(object sender, RoutedEventArgs e)
{
m_Map.ZoomLevel = 11;
m_Map.LayoutUpdated += m_Map_LayoutUpdated;
}
void m_Map_LayoutUpdated(object sender, EventArgs e)
{
if (!isRemoved)
{
RemoveOverlayTextBlock();
}
}
void RemoveOverlayTextBlock()
{
var textBlock = m_Map.DescendantsAndSelf.OfType<TextBlock>()
.SingleOrDefault(d => d.Text.Contains("Invalid Credentials") ||
d.Text.Contains("Unable to contact Server"));
if (textBlock != null)
{
var parentBorder = textBlock.Parent as Border;
if (parentBorder != null)
{
parentBorder.Visibility = Visibility.Collapsed;
}
isRemoved = true;
}
}
You have to include a class LinqToVisualTree witch can be downloaded from here.
And here is the original post
You can either handle the LoadingError event per instance or extend the Map control yourself as described in this post. You can then remove the layer than contains the error message so that it's not shown to the user.
public partial class CachedMap : Map
{
public CachedMap() : base()
{
base.LoadingError += (s, e) =>
{
base.RootLayer.Children.RemoveAt(5);
};
}
}
I know it's a very old thread, but anyways...
You can listen for LoadingError event as suggested #keyboardP, search for LoadingErrorMessage control in visual tree and simply hide it.
Map.LoadingError += MapOnLoadingError;
private void MapOnLoadingError(object sender, LoadingErrorEventArgs e)
{
var errorMessage = Map.FindChildOfType<LoadingErrorMessage>();
errorMessage.Visibility = Visibility.Collapsed;
}