I am new in c# programming. I am trying to scrape data from div (I want to display temperature from web page in Forms application).
This is my code:
private void btnOnet_Click(object sender, EventArgs e)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
var temperatura = doc.DocumentNode.SelectSingleNode("/html/body/div[1]/div[3]/div/section/div/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]");
onet.Text = temperatura.InnerText;
}
This is the exception:
System.NullReferenceException:
temperatura was null.
You can use this:
public static bool TryGetTemperature(HtmlAgilityPack.HtmlDocument doc, out int temperature)
{
temperature = 0;
var temp = doc.DocumentNode.SelectSingleNode(
"//div[contains(#class, 'temperature')]/div[contains(#class, 'temp')]");
if (temp == null)
{
return false;
}
var text = temp.InnerText.EndsWith("°") ?
temp.InnerText.Substring(0, temp.InnerText.Length - 5) :
temp.InnerText;
return int.TryParse(text, out temperature);
}
If you use XPath, you can select with more precission your target. With your query, a bit change in the HTML structure, your application will fail. Some points:
// is to search in any place of document
You search any div that contains a class "temperature" and, inside that node:
you search a div child with "temp" class
If you get that node (!= null), you try to convert the degrees (removing '°' before)
And check:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
if (TryGetTemperature(doc, out int temperature))
{
onet.Text = temperature.ToString();
}
UPDATE
I updated a bit the TryGetTemperature because the degrees are encoded. The main problem is the HTML. When you request the source code you get some HTML that browser update later dynamically. So the HTML that you get is not valid for you. It doesn't contains the temperature.
So, I see two alternatives:
You can use a browser control (in Common Controls -> WebBrowser, in the Form Tools with the Button, Label...), insert into your form and Navigate to the page. It's not difficult, but you need learn some things: wait to events for page downloaded and then get source code from the control. Also, I suppose you'll want to hide the browser control. Be carefully, sometimes the browser doesn't works correctly if you hide. In that case, you can use a visible Form outside desktop and manage activate events to avoid activate this window. Also, hide from Task Window (Alt+Tab). Things become harder in this way but sometimes is the only way.
The simple way is search the location that you want (ex: Madryt) and look in DevTools the request done (ex: https://pogoda.onet.pl/prognoza-pogody/madryt-396099). Use this Url and you get a valid HTML.
How would I open a new window in JavaScript and insert HTML data instead of just linking to an HTML file?
I would not recomend you to use document.write as others suggest, because if you will open such window twice your HTML will be duplicated 2 times (or more).
Use innerHTML instead
var win = window.open("", "Title", "toolbar=no,location=no,directories=no,status=no,menubar=no,scrollbars=yes,resizable=yes,width=780,height=200,top="+(screen.height-400)+",left="+(screen.width-840));
win.document.body.innerHTML = "HTML";
You can use window.open to open a new window/tab(according to browser setting) in javascript.
By using document.write you can write HTML content to the opened window.
When you create a new window using open, it returns a reference to the new window, you can use that reference to write to the newly opened window via its document object.
Here is an example:
var newWin = open('url','windowName','height=300,width=300');
newWin.document.write('html to write...');
Here's how to do it with an HTML Blob, so that you have control over the entire HTML document:
https://codepen.io/trusktr/pen/mdeQbKG?editors=0010
This is the code, but StackOverflow blocks the window from being opened (see the codepen example instead):
const winHtml = `<!DOCTYPE html>
<html>
<head>
<title>Window with Blob</title>
</head>
<body>
<h1>Hello from the new window!</h1>
</body>
</html>`;
const winUrl = URL.createObjectURL(
new Blob([winHtml], { type: "text/html" })
);
const win = window.open(
winUrl,
"win",
`width=800,height=400,screenX=200,screenY=200`
);
You can open a new popup window by following code:
var myWindow = window.open("", "newWindow", "width=500,height=700");
//window.open('url','name','specs');
Afterwards, you can add HTML using both myWindow.document.write(); or myWindow.document.body.innerHTML = "HTML";
What I will recommend is that first you create a new html file with any name.
In this example I am using
newFile.html
And make sure to add all content in that file such as bootstrap cdn or jquery, means all the links and scripts. Then make a div with some id or use your body and give that a id. in this example I have given id="mainBody" to my newFile.html <body> tag
<body id="mainBody">
Then open this file using
<script>
var myWindow = window.open("newFile.html", "newWindow", "width=500,height=700");
</script>
And add whatever you want to add in your body tag. using following code
<script>
var myWindow = window.open("newFile.html","newWindow","width=500,height=700");
myWindow.onload = function(){
let content = "<button class='btn btn-primary' onclick='window.print();'>Confirm</button>";
myWindow.document.getElementById('mainBody').innerHTML = content;
}
myWindow.window.close();
</script>
it is as simple as that.
You can also create an "example.html" page which has your desired html and give that page's url as parameter to window.open
var url = '/example.html';
var myWindow = window.open(url, "", "width=800,height=600");
Use this one. It worked for me very perfect.
For New window:
new_window = window.open(URL.createObjectURL(new Blob([HTML_CONTENT], { type: "text/html" })))
for pop-up
new_window = window.open(URL.createObjectURL(new Blob([HTML_CONTENT], { type: "text/html" })),"width=800,height=600")
Replace HTML_CONTENT with your own HTML Code
Like:
new_window = window.open(URL.createObjectURL(new Blob(["<h1>Hello</h1>"], { type: "text/html" })))
if your window.open() & innerHTML works fine, ignore this answer.
following answer only focus on cross-origin access exception
#key-in_short,workaround:: [for cross-origin access exception]
when you exec code in main.html -- which tries to access file window_ImageGallery.html by using window.open() & innerHTML
for anyone who encounter cross-origin access exception
and you dont want to disable/mess_around_with Chrome security policy
-> you may use query string to transfer the html code data, as a workaround.
#details::
#problem-given_situation,#problem-arise_problem::
say you exec following simple window.open command as other answer suggested.
let window_Test = window.open('window_ImageGallery.html', 'Image Enlarged Window' + $(this).attr('src'), 'width=1000,height=800,top=50,left=50');
window_Test.document.body.innerHTML = 'aaaaaa';
you may encounter following cross-origin access exception
window_Test.document.body.innerHTML = 'aaaaaa'; // < Exception here
Uncaught DOMException: Blocked a frame with origin "null" from accessing a cross-origin frame.
=> #problem-solution-workaround::
you may use query string to transfer the html code data, as a workaround. <- Transfer data from one HTML file to another
#eg::
in your main.html
// #>> open ViewerJs in a new html window
eleJq_Img.click(function() {
// #>>> send some query string data -- a list of <img> tags, to the new html window
// #repeat: must use Query String to pass html code data, else you get `Uncaught DOMException: Blocked a frame with origin "null" from accessing a cross-origin frame.` (cross origin access issue)
let id_ThisImg = this.id;
let ind_ThisImg = this.getAttribute('data-index-img');
let url_file_html_window_ImageGallery = 'window_ImageGallery.html'
+ '?queryStr_html_ListOfImages=' + encodeURIComponent(html_ListOfImages)
+ '&queryStr_id_ThisImg=' + encodeURIComponent(id_ThisImg)
+ '&queryStr_ind_ThisImg=' + encodeURIComponent(ind_ThisImg);
// #>>> open ViewerJs in a new html window
let window_ImageGallery = window.open(url_file_html_window_ImageGallery, undefined, 'width=1000,height=800,top=50,left=50');
});
in your window_ImageGallery.html
window.onload = function () {
// #>> get parameter from URL
// #repeat: must use Query String to pass html code data, else you get `Uncaught DOMException: Blocked a frame with origin "null" from accessing a cross-origin frame.` (cross origin access issue)
// https://stackoverflow.com/questions/17502071/transfer-data-from-one-html-file-to-another
let data = getParamFromUrl();
let html_ListOfImages = decodeURIComponent(data.queryStr_html_ListOfImages);
let id_ThisImgThatOpenedTheHtmlWindow = decodeURIComponent(data.queryStr_id_ThisImg);
let ind_ThisImgThatOpenedTheHtmlWindow = decodeURIComponent(data.queryStr_ind_ThisImg);
// #>> add the Images to the list
document.getElementById('windowImageGallery_ContainerOfInsertedImages').innerHTML = html_ListOfImages;
// -------- do your stuff with the html code data
};
function getParamFromUrl() {
let url = document.location.href;
let params = url.split('?')[1].split('&');
let data = {};
let tmp;
for (let i = 0, l = params.length; i < l; i++) {
tmp = params[i].split('=');
data[tmp[0]] = tmp[1];
}
return data
}
#minor-note::
(seems) sometimes you may not get the cross-origin access exception
due to, if you modify the html of 'window_ImageGallery.html' in main.html before window_ImageGallery.html is loaded
above statement is based on my test
& another answer -- window.open: is it possible open a new window with modify its DOM
if you want to make sure to see that Exception,
you can try to wait until the opening html window finish loading, then continue execute your code
#eg::
use defer() <- Waiting for child window loading to complete
let window_ImageGallery = window.open('window_ImageGallery.html', undefined, 'width=1000,height=800,top=50,left=50');
window_ImageGallery.addEventListener("unload", function () {
defer(function (){
console.log(window_ImageGallery.document.body); // < Exception here
});
});
function defer (callback) {
var channel = new MessageChannel();
channel.port1.onmessage = function (e) {
callback();
};
channel.port2.postMessage(null);
}
or use sleep() with async What is the JavaScript version of sleep()?
eleJq_Img.click(async function() {
...
let window_Test = window.open( ...
...
await new Promise(r => setTimeout(r, 2000));
console.log(window_Test.document.body.innerHTML); // < Exception here
});
or you get null pointer exception
if you try to access elements in window_ImageGallery.html
#minor-comment::
There are too many similar Posts about the cross-origin issue. And there are some posts about window.open()
Idk which post is the best place to place the answer. And I picked here.
I'm trying to scrape a link from the source code of a website that varies with every source code.
Form example:
<div align="center">
<a href="http://www10.site.com/d/the rest of the link">
<span class="button_upload green">
The next time I get the source code the http://www10 changes to any http://www + number like http://www65.
How can I scrape the exact link with the new changed number?
Edit :
Here's how i use RE MatchCollection m1 = Regex.Matches(textBox6.Text, "(href=\"http://www10)(?<td_inner>.*?)(\">)", RegexOptions.Singleline);
You mentioned in the comments that you use Regulars expressions for parsing the HTML Document. That is a the hardest way you can do this (also, generally not recommended!). Try using a HTML Parser like http://html-agility-pack.net
For HTML Agility Pack: You install it via NuGet Packeges and here is an example (posted on their website):
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[#href]")
{
HtmlAttribute att = link["href"];
att.Value = FixLink(att);
}
doc.Save("file.htm");
It can also load string contents, not just files. You use xPath or CSS Selectors to navigate inside the document and select what you want.
How about a JS function like this, run when the page loads:
// jQuery is required!
var updateLinkUrl = function (num) {
$.each($('.button_upload.green'), function (pos, el) {
var orig = $(el).parent().prop("href");
var newurl = orig.replace("www10", "www" + num);
$(el).parent().prop("href", newurl);
});
};
$(document).ready(function () { updateLinkUrl(65); });
I am trying to send a string variable containing HTML code to a textbox located inside a frame. The HTML code looks like this:
<iframe id="rte" class="rteIfm" frameborder="0" contenteditable="" title="Description">
<html>
<head>
</head>
<body role="textbox" aria-multiline="true">
</body>
</html>
</iframe>
I have tried two things...
Firstly, I tried switching frames and using the x-path that firebug gave me to send the keys:
driver.SwitchTo().Frame(driver.FindElement(By.Id("rte")));
driver.FindElement(By.XPath("/html/body")).SendKeys(myStringContainingHTML);
Secondly, I tried sending the keys to the element with the ID the same as the frame:
driver.FindElement(By.Id("rte")).SendKeys(myStringContainingHTML);
In both cases the same thing happened: at first the string (containing HTML code) began to be typed into the textbox as expected. Then after about one tag was typed the browser started to navigate to different pages. I went to google and started typing in the search box and then searching for chunks of HTML code that were in the string.
Seems very strange to me, where did I go wrong?
I still don't know why this is happening but the workaround I'm using its to programmatically copy the string to clipboard and past it with WebDriver SendKeys() which is normally as simple as:
Clipboard.SetText(myStringContainingHTML);
driver.FindElement(By.Id("myTxtBoxId")).SendKeys(OpenQA.Selenium.Keys.LeftControl + "v");
But actually I tried to do it while multithreading and got the error:
"Current thread must be set to single thread apartment (STA) mode before OLE calls can be made. Ensure that your Main function has STAThreadAttribute marked on it."
So I had to do this workaround just to get it in clipboard:
class MyAsyncClass
{
static IWebDriver driver;
public static void MyAsyncMethod()
{
FirefoxProfile myProfile = new FirefoxProfile();
driver = new FirefoxDriver(myProfile);
driver.Manage().Timeouts().ImplicitlyWait(TimeSpan.FromSeconds(20));
STAClipBoard(myStringWithHtmlCode);
driver.FindElement(By.Id("myTxtBoxId")).SendKeys(OpenQA.Selenium.Keys.LeftControl + "v");
}
private static void STAClipBoard(string myStringWithHtmlCode)
{
ClipClass clipClass = new ClipClass();
clipClass.myString = myString;
System.Threading.Thread t = new System.Threading.Thread(clipClass.CopyToClipBoard);
t.SetApartmentState(System.Threading.ApartmentState.STA);
t.Start();
t.Join();
}
}//class
public class ClipClass
{
public string myString;
public void CopyToClipBoard()
{
Clipboard.SetText(description);
}
}
}
I'm doing some web automation via C# and a WebBrowser. There's a link which I need to 'click', but since it fires a Javascript function, apparently the code needs to be executed rather than just having the element clicked (i.e. element.InvokeMember("click")). Here's the href for the element, which opens an Ajax form:
javascript:__doPostBack("ctl00$cphMain$lnkNameserverUpdate", "")
I've tried:
webBrowser1.Document.InvokeScript("javascript:__doPostBack", new object[] { "ctl00$cphMain$lnkNameserverUpdate", "" });
and:
webBrowser1.Document.InvokeScript("__doPostBack", new object[] { "ctl00$cphMain$lnkNameserverUpdate", "" });
and a few other things. The code gets hit, but the script doesn't get fired. Any ideas would be most appreciated.
Gregg
BTW Here's the full element in case it's useful:
NS51.DOMAINCONTROL.COM<br/>NS52.DOMAINCONTROL.COM<br/>
Have a look at this link:
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.objectforscripting.aspx
I've actually used this in the past, and it works perfectly.
HtmlDocument doc = browser.Document;
HtmlElement head = doc.GetElementsByTagName("head")[0];
HtmlElement s = doc.CreateElement("script");
s.SetAttribute("text","function sayhello() { alert('hello'); }");
head.AppendChild(s);
browser.Document.InvokeScript("sayHello");