Get "real" HTML source from website - c#

So, I've come across an issue where my favorite radio station plays a song I don't know while I'm driving. They don't have one of those pages that shows a list of songs that they've played; however, they do have a "Now Playing" section on their site that shows what's currently playing and by who. So, I am trying to write a small program that will poll the site ever 2 minutes to retrieve the name of the song and the artist. Using Chrome dev tools, I can see the song title and artist in the source. But when I view the page source, it doesn't show up. They are using a javascript to run display that info. I've tried the following:
private void button1_Click(object sender, EventArgs e)
{
webBrowser1.Navigate(#"http://www.thebuzz.com/main.html");
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser1_DocumentCompleted);
}
private void webBrowser1_DocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
do
{
// Do nothing while we wait for the page to load
}
while (webBrowser1.ReadyState == WebBrowserReadyState.Loading);
var test = webBrowser1.DocumentText;
textBox1.Text = test.ToString();
}
Essentially, I'm loading it into a WebBrowser and trying to get the source this way. But I'm still not getting the part after the javascript is run. Is there a way to actually retrieve the rendered HTML after the fact?
EDIT
Also, is there a way in the WebBrowser to allow scripts to run? I get popups asking me if I want to allow them to run. I don't want to suppress them, I need them to run.

As Jay Tomten said in the comments, you're trying to fix the result of your problem, not the cause. The cause of the problem is that they're using Javascript to update that part of the page. Instead of working around that by letting the Javascript do its update and then reading what it wrote, ask yourself where the Javascript is getting the info from and whether you can go to the same place. Open up something that lets you see web traffic - Fiddler, or Chrome's dev console, for example. Watch for POST calls. One of them will likely be an AJAX request in which the Javascript on the page is getting the current song. Note the URL, inspect the call to see what parameters it sends and what data it gets back. You can use Postman or something like it to assemble a POST request and work out how the Javascript on that site is getting its data, and then write a little code to make your own call to that URL and parse what comes back.

Related

WebBrowser Control Retrieving jQuery Text

I am trying to retrieve whenever the website displays the following message from a jQuery event. Initially this HTML inst displayed in the HTML.
<div id="toast-container" class="toast-top-right"><div class="toast toast-error" aria-live="assertive" style="display: block;"><div class="toast-message">Check email & password.</div></div></div>
My assumption is, that the webBrowser1.DocumentText.Contains is only looking from the initial load of the content.
So I thought maybe some sort of timer would work every 5 seconds, looking to see if the code has changed - but I don't even think this is right as it's checking the code that's already loaded repeatedly?
private void timer2_Tick(object sender, EventArgs e)
{
// Checks for any errors on sign in page
if (webBrowser1.DocumentText.Contains("toast toast-error"))
{
// Toast Notifications
var signinErrorNotification = new Notification("Error", "Please check your email and password are correct.", 50, FormAnimator.AnimationMethod.Fade, FormAnimator.AnimationDirection.Left);
signinErrorNotification.Show();
}
}
How do I go about getting the latest code that's been affected by any jQuery.
P.S. My c# level is beginner.
The Document property should give you what you need.
Notice that the docs for DocumentText say
Gets or sets the HTML contents of the page displayed in the WebBrowser
control.
For Document they say
Gets an HtmlDocument representing the Web page currently displayed in the WebBrowser control.
To me that's saying that DocumentText is like the starting document and Document is the current DOM. Also see https://learn.microsoft.com/en-us/dotnet/framework/winforms/controls/how-to-access-the-managed-html-document-object-model

Stop multiple WebBrowsers from starting yt-videos?

I'm working with a winform TabControl showing WebBrowsercontrols to display youtube videos.
However with two videos or more it becomes really annoying as all videos start directly.
I basically need to find out if there is a JS function, html code or a simple WebBrowser property to change, so videos are paused.
It might come in handy to find something like that for video quality too.
Has anybody ever heard of/seen where this option is stored? Or maybe the Js function itself being invoked when manually setting the quality?
EDIT:
b.DocumentCompleted += delegate { b.DocumentText=b.DocumentText.Insert(b.DocumentText.IndexOf("class=\"video-stream html5-main-video\""), "autoplay=false ");};
b.Url = new System.Uri(inp[s], System.UriKind.Absolute);
Basically this should add a new Event handler on each webbrowser form that modifies the DocumentText when the Uri that is called during creation has loaded.
Even though the browser debugger shows
<video tabindex="-1" class="video-stream html5-main-video" controlslist="nodownload" style=... src=...></video>
this isn't in the actual source code.
However I found
$oa=function(a){g.S(a.o,"video-stream");g.S(a.o,"html5-main-video");var b=a.app.g;b.zc&&a.o.setAttribute("data-no-fullscreen",!0);b.Oh&&(a.o.setAttribute("webkit-playsinline",""),a.o.setAttribute("playsinline",""));b.Nr&&a.o&&a.P(a.o,"click",a.o.play,a.o)};
in the base.js. Is it possible that youtube generates the html from the js?
How can I modify the video-tag attributes then?
I tried to modify when the event handler manipulates the video tag, since there may be DocumentCompleted events thrown from scripts or anything.
delegate (object sender, System.Windows.Forms.WebBrowserDocumentCompletedEventArgs e){
if (e.Url.AbsolutePath != ((System.Windows.Forms.WebBrowser)sender).Url.AbsolutePath){
//...
However it still fails as there is no occurance of the specific class that on the video tag.
I now dodged this by loading the Url only when the browser tab is selected, if someone finds a real solution, feel free to share

WebBroswer_DocumentCompleted event is not working

I have a form with a browser control. (this control uses IE9 because I set values on registry editor)
This web browser navigates to a specific URL and fills all fields on HTML page and submit them, then result page is displayed.
My problem is that i just want to know when this reslut page is fully loaded or completed so that i can fetch some information.
I use WebBroswer_DocumentCompleted event which works fine for the first page but not for the result page as it triggers before result
page is loaded.
I tried other solution which is to check the div tag inside the result page (this tag only appears when result page is loaded completely) and it works but not always.
My code:
private void WebBroswer_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection elc3 = this.BotBrowser.Document.GetElementsByTagName("div");
foreach (HtmlElement el in elc3)
{
if (el.GetAttribute("id").Equals("Summary_Views")) //this determine i am at the result page
{
// fetch the result
}
}}
That div id is "Summary_Views".
I can provide you the link of that website on demand which is just for BLAST tools and database website for research purpose.
Frames and IFrames will cause this event to fire multiple times. Check out this answer:
HTML - How do I know when all frames are loaded?
Or this answer:
How to use WebBrowser control DocumentCompleted event in C#?
Or ms's kb article:
http://support.microsoft.com/kb/180366
Do you know if there are frames? If so then please say so, so people can help with that. If not then say so, so people can offer alternatives.
My guess is that the content is being generated by JavaScript. If it is then the document is complete before the JavaScript executes and you need to somehow wait until the JavaScript is done. The solution depends upon the web page. So you might need to process multiple document completes for diagnostic purposes and attempt to determine if there is a way to know which one you need.
At last i have solved my problem. I put a timer control from toolbox and set its time interval to 200ms and its Autoreset property to false. I set a tick event which has a code to check every 200ms whether this Div has been loaded or not, after that, Autoreset property is set to true.This solution is working perfectly :)

C# Webbrowser navigating links in order

I'm trying to teach myself C# and to start I'm trying to convert a program I originally wrote in Autoit.
I'm using a Windows Application Form and the program is suppose to take one or two links as input. Navigate to those to pages, grab some links from a table, then visit each of those pages to grab some content.
If only one link is entered it seems to go to that page and grab the links from a table like it is suppose to. If two links are entered it seems to only grab the links from the second table.
So if two links are passed this method
private void getPageURLList(string site1, string site2)
{
getPageURLList(site1);
getPageURLList(site2);
}
Calls the same method that gets called when there is only one link
private void getPageURLList(string site)
{
webBrowser.DocumentCompleted += createList;
webBrowser.Navigate(site);
}
I'm pretty sure the issue is "Navigate" is getting called a second time before createList even starts the first time.
The reason I am using WebBrowser is because these pages use Javascript to sort the links in the table so HTTPRequests and the HTMLAgilityPack don't seem to be able to grab those links.
So I guess my question is: How can I keep my WebBrowser from navigating to a new page until after I finish what I'm doing on the current page?
You have to make a bool variable to know when the first proccess has completed. And then start the other. Application.DoEvents() will help you.
Note that all this events run in the main thread.
In your documentcompleted event you do your link processing. At the end of the link processingyou tel the browser to navigate to the next url

Trying to play a youtube video using MediaPlayerLauncher

I'm coding a windows phone 7.1 app, and when the user clicks a specific button, a youtube video would be played using a MediaPlayerLauncher.
Problem is, MediaPlayerLauncher can't play the video just by giving it the video's url; the video's link itself is in the page's html. Now, I managed to pull out that html by using a WebClient() to download the html and extract the video's link from it, by attaching this event for 'client', my WebClient:
void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
pageHtml = e.Result;
stringBuilder = new StringBuilder(pageHtml);
if (pageHtml != null)
{
if (pageHtml.Contains("<html"))
{
if (pageHtml.Contains("<script"))
{
stringBuilder.Replace("</script>", string.Format("{0}{1}</script>", NOTIFY_SCRIPT, Environment.NewLine));
}
else if (pageHtml.Contains("<head"))
{
stringBuilder.Replace("</head>", string.Format("{0}{1}</head>", NOTIFY_SCRIPT, Environment.NewLine));
}
else
{
stringBuilder.Replace("</html>", string.Format("{0}{1}</html>", NOTIFY_SCRIPT, Environment.NewLine));
}
}
else
{
//Just skip it or display an error message or whatever
}
rssBrowser.NavigateToString(stringBuilder.ToString());
}
}
Basically, I add a script, 'NOTIFY_SCRIPT', which detects the presence of the youtube video(if you want more details about this, the video's link is basically in an tag, so I just get all the tags, find the one with the link, and get it's contents (the link)).
But still, this just doesn't work. I tried putting up a WebBrowser and making it navigate and triggering the event every time the WebBrowser navigates, in order to make sure that it's navigating to the correct page. But sometimes, it doesn't navigate properly; it gets stuck on an intermediate page or goes to the original youtube page. So, I decided to take a look at the incoming html. For some reason, the incoming html is missing youtube's script. I checked the script on the youtube page using my browser (I went to the mobile web page and 'inspected the element'), and it has a script there, but when the WebClient gets the html the script is missing.
So maybe that's the problem? Does anyone know how to solve this problem, or maybe you have already done something like that in a different way?
Sorry for the long question, and thanks!
You may be better of using the WebBrowserTask or the API to get the URL.

Categories