I have been working on some code to scrape data from a webpage using C#. Unfortunately after creating the code and getting it to work i noticed that the webpage i had to implement it on was using Jquery on the input my code was meant to be going on.
So now i have to change my code to work with Jquery. I was wondering if page scrapping is even possible in Jquery and if anyone could give me a hand?
I have found this code but have no idea if it will work as my Jquery skills are very bad.
p.s i am coding in ASP.Net web forms using C# codebehind
EDIT: it would also work if i could execute the current javascript from the code-behind
$.get("/path/to/other/page",function(data){
$('#data').append($('li',data));
}
my C# code is
protected void GetAverageRent_TextChanged(object sender, EventArgs e)
{
string Postcode = _postCodeInput.Value.ToString();
var webGet = new HtmlWeb();
var doc = webGet.Load("http://www.webaddress.com" + Postcode);
HtmlNode AvgPrice = doc.DocumentNode.SelectSingleNode("//div[#class='split2r right']//strong[#class='price big']");
if (AvgPrice != null)
{
AverageRentLbl.Text = AvgPrice.InnerHtml.ToString();
}
else
{
AverageRentLbl.Text = "Invalid Postcode!";
AverageRentLbl.ForeColor = Color.Red;
AverageRentLbl.Font.Bold = true;
}
}
Related
I am new in c# programming. I am trying to scrape data from div (I want to display temperature from web page in Forms application).
This is my code:
private void btnOnet_Click(object sender, EventArgs e)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
var temperatura = doc.DocumentNode.SelectSingleNode("/html/body/div[1]/div[3]/div/section/div/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]");
onet.Text = temperatura.InnerText;
}
This is the exception:
System.NullReferenceException:
temperatura was null.
You can use this:
public static bool TryGetTemperature(HtmlAgilityPack.HtmlDocument doc, out int temperature)
{
temperature = 0;
var temp = doc.DocumentNode.SelectSingleNode(
"//div[contains(#class, 'temperature')]/div[contains(#class, 'temp')]");
if (temp == null)
{
return false;
}
var text = temp.InnerText.EndsWith("°") ?
temp.InnerText.Substring(0, temp.InnerText.Length - 5) :
temp.InnerText;
return int.TryParse(text, out temperature);
}
If you use XPath, you can select with more precission your target. With your query, a bit change in the HTML structure, your application will fail. Some points:
// is to search in any place of document
You search any div that contains a class "temperature" and, inside that node:
you search a div child with "temp" class
If you get that node (!= null), you try to convert the degrees (removing '°' before)
And check:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
if (TryGetTemperature(doc, out int temperature))
{
onet.Text = temperature.ToString();
}
UPDATE
I updated a bit the TryGetTemperature because the degrees are encoded. The main problem is the HTML. When you request the source code you get some HTML that browser update later dynamically. So the HTML that you get is not valid for you. It doesn't contains the temperature.
So, I see two alternatives:
You can use a browser control (in Common Controls -> WebBrowser, in the Form Tools with the Button, Label...), insert into your form and Navigate to the page. It's not difficult, but you need learn some things: wait to events for page downloaded and then get source code from the control. Also, I suppose you'll want to hide the browser control. Be carefully, sometimes the browser doesn't works correctly if you hide. In that case, you can use a visible Form outside desktop and manage activate events to avoid activate this window. Also, hide from Task Window (Alt+Tab). Things become harder in this way but sometimes is the only way.
The simple way is search the location that you want (ex: Madryt) and look in DevTools the request done (ex: https://pogoda.onet.pl/prognoza-pogody/madryt-396099). Use this Url and you get a valid HTML.
My browser just keeps loading when navigatetopage using scrapysharp and won't go to the next line of code. Below is my code using c# asp.net web form. May I know why? The link I use is working and can manually browse. The code just gets stuck at the Browser.NavigateToPage(new Uri("http://www.asnb.com.my/v3_/asnbv2_0index.php")); and keep loading in the browser. And I am using asp.net webform.
ScrapingBrowser Browser = new ScrapingBrowser();
Browser.AllowAutoRedirect = true;
Browser.AllowMetaRedirect = true;
WebPage PageResult = Browser.NavigateToPage(new Uri("http://www.asnb.com.my/v3_/asnbv2_0index.php"));
HtmlNode TitleNode = PageResult.Html.CssSelect(".navbar-brand").First();
I was having the same problem and decided not to use Browser.NavigateToPage and instead get the PageResult.Htmlusing an HtmlDocument.
For example:
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://www.asnb.com.my/v3_/asnbv2_0index.php");
HtmlNode TitleNode = doc.DocumentNode.CssSelect(".navbar-brand").First();
This should get you your expected results.
Move your call to a backgroundworker thread. Notice that in line 353 in ScrapingBrowser.cs (ScrapySharp/ScrapySharp/Network/ScrapingBrowser.cs), the call to NavigateToPage() calls the Async version:
public WebPage NavigateToPage(Uri url, HttpVerb verb = HttpVerb.Get, string data = "", string contentType = null)
{
return NavigateToPageAsync(url, verb, data, contentType).Result;
}
I had the same problem, as soon as I moved the call to my DoWork method in my BGW thread, it starts behaving the way you expect.
Another method would be to use the async version of the NavigateToPage eg:
private async Task<WebPage> LoadPage(Uri uri)
{
WebPage page = await browser.NavigateToPageAsync(uri);
return page;
}
I've done quite a bit of searching (several hours actually) but I haven't been able to get this working. Basically, I have this button:
<asp:Button runat="server" Text="Go!" id="go" onClick="getDoc()" />
and this block of script:
<script type="c#" runat="server">
public void getDoc(object sender, EventArgs e) {
// Test to see if function was running (it's not...)
DocFrame.Attributes["src"] = "http://www.google.com";
// Get the current state of the dropdowns
String dropYear = (String)Year.SelectedValue;
String dropDiv = (String)Division.SelectedValue;
String dropControl = (String)Control.SelectedValue;
String dropQuart= (String)Quarter.SelectedValue;
// Get the Site where the list is
using (SPSite siteCol = new SPSite("http://portal/Corporate/IT/")) {
using (SPWeb web = siteCol.RootWeb){
// Get the list items we need
SPListItemCollection items = list.GetItems("Year", "Division", "Control", "Quarter");
SPListItem item = null;
// Loop through them until we find a matching everything
foreach (SPListItem it in items){
if(it.Year == dropYear && it.Division == dropDiv && it.Control == dropControl && it.Quarter == dropQuart){
item = it;
break;
}
}
// Assign the item as a string
String URL = (String)item["Title"];
// Set the iframe to the new URL
DocFrame.Attributes["src"] = URL;
}
}
}
It's all in the page where this is happening, please keep in mind that I've been using sharepoint for less than a week and have only ever coded in C++, so I could be doing everything horribly wrong. Anyway, it seems that getDoc() is never even getting called, so can anyone point out what I'm doing wrong?
Instead of
onClick="getDoc()"
you should do
OnClick="getDoc"
That's the proper way to wire an up an event.
By the way, you should consider following C# Naming Guidelines. If you were using better naming, it might look like this:
<asp:Button runat="server" Text="Go!" id="GoBtn" onClick="GoBtn_Click" />
Common practice convention is to append the event name after the ID of the control. It's not required, but it looks cleaner and other developers like to see that when they look at your code.
Also, DocFrame.Attributes["src"] = "http://www.google.com"; is not a good way to see if the function is running. It doesn't update the page in realtime, as the entire server side function executes, then the results are sent to the client. Instead, use your IDE's debugging tools to hook up to the server and set code breaks etc. Or what I do is have the code send me an email, I created a little utility library for that.
I have an odd request. I am wondering if it is possible to have your C# solution file publish the entire Site from the master database to the web database. I am in my development environment and the amount of items in Sitecore is changing daily as I am working with multiple people. This is not something that is going to be used to Production Content Management or Content Delivery. Purely development.
Is it possible to trigger a full Site publish in C# just like pressing the publish button in the content editor? And what would the code look like?
/sitecore/shell/Applications/Publish.aspx
I assume this C# method would work with Sitecore.Publishing.PublishManager?
Thanks!
We also use that on our projects...
We have that placed in a regular .aspx page. I hope that helps:
protected void Page_Load(object sender, EventArgs e)
{
PublishMode publishMode = PublishMode.Full;
using (new Sitecore.SecurityModel.SecurityDisabler())
{
var webDb = Sitecore.Configuration.Factory.GetDatabase("web");
var masterDb = Sitecore.Configuration.Factory.GetDatabase("master");
try
{
foreach (Language language in masterDb.Languages)
{
//loops on the languages and do a full republish on the whole sitecore content tree
var options = new PublishOptions(masterDb, webDb, publishMode, language, DateTime.Now) { RootItem = masterDb.Items["/sitecore"], RepublishAll = true, Deep = true };
var myPublisher = new Publisher(options);
myPublisher.Publish();
}
}
catch (Exception ex)
{
Sitecore.Diagnostics.Log.Error("Could not publish", ex);
}
}
}
This question already has answers here:
Calling JavaScript Function From CodeBehind
(21 answers)
Closed 9 years ago.
I am trying to learn asp.net. Assuming that I have this code:
if (command.ExecuteNonQuery() == 0)
{
// JavaScript like alert("true");
}
else
{
// JavaScript like alert("false");
}
How to I can invoke JavaScript from C# code behind? How to do that by putting that JavaScript in Scripts directory which is created by default in MS Visual Studio?
Here is method I will use from time to time to send a pop message from the code behind. I try to avoid having to do this - but sometimes I need to.
private void LoadClientScriptMessage(string message)
{
StringBuilder script = new StringBuilder();
script.Append(#"<script language='javascript'>");
script.Append(#"alert('" + message + "');");
script.Append(#"</script>");
Page.ClientScript.RegisterStartupScript(this.GetType(), "messageScript", script.ToString());
}
You can use RegisterStartupScript to load a javascript function from CodeBehind.
Please note that javascript will only run at client side when the page is render at client's browser.
Regular Page
Page.ClientScript.RegisterStartupScript(this.GetType(), "myfunc" + UniqueID,
"myJavascriptFunction();", true);
Ajax Page
You need to use ScriptManager if you use ajax.
ScriptManager.RegisterStartupScript(Page, Page.GetType(), "myfunc" + UniqueID,
"myJavascriptFunction();", true);
Usually these "startupscripts" are handy for translations or passing settings to javascript.
Although the solution Mike provided is correct on the .Net side I doubt in a clean (read: no spaghetti code) production environment this is a good practice. It would be better to add .Net variables to a javascript object like so:
// GA example
public static string GetAnalyticsSettingsScript()
{
var settings = new StringBuilder();
var logged = ProjectContext.CurrentUser != null ? "Logged" : "Not Logged";
var account = Configuration.Configuration.GoogleAnalyticsAccount;
// check the required objects since it might not yet exist
settings.AppendLine("Project = window.Project || {};");
settings.AppendLine("Project.analytics = Project.analytics || {};");
settings.AppendLine("Project.analytics.settings = Project.analytics.settings || {};");
settings.AppendFormat("Project.analytics.settings.account = '{0}';", account);
settings.AppendLine();
settings.AppendFormat("Project.analytics.settings.logged = '{0}';", logged);
settings.AppendLine();
return settings.ToString();
}
And then use the common Page.ClientScript.RegisterStartupScript to add it to the HTML.
private void RegisterAnalyticsSettingsScript()
{
string script = GoogleAnalyticsConfiguration.GetAnalyticsSettingsScript();
if (!string.IsNullOrEmpty(script))
{
Page.ClientScript.RegisterStartupScript(GetType(), "AnalyticsSettings", script, true);
}
}
On the JavaScript side it might look like this:
// IIFE
(function($){
// 1. CONFIGURATION
var cfg = {
trackingSetup: {
account: "UA-xxx-1",
allowLinker: true,
domainName: "auto",
siteSpeedSampleRate: 100,
pluginUrl: "//www.google-analytics.com/plugins/ga/inpage_linkid.js"
},
customVariablesSetup: {
usertype: {
slot: 1,
property: "User_type",
value: "Not Logged",
scope: 1
}
}
};
// 2. DOM PROJECT OBJECT
window.Project = window.Project || {};
window.Project.analytics = {
init: function(){
// loading ga.js here with ajax
},
activate: function(){
var proj = this,
account = proj.settings.account || cfg.trackingSetup.account,
logged = proj.settings.logged || cfg.customVariablesSetup.usertype.value;
// override the cfg with settings from .net
cfg.trackingSetup.account = account;
cfg.customVariablesSetup.usertype.value = logged;
// binding events, and more ...
}
};
// 3. INITIALIZE ON LOAD
Project.analytics.init();
// 4. ACTIVATE ONCE THE DOM IS READY
$(function () {
Project.analytics.activate();
});
}(jQuery));
The advantage with this setup is you can load an asynchronous object and override the settings of this object by .Net. Using a configuration object you directly inject javascript into the object and override it when found.
This approach allows me to easily get translation strings, settings, and so on ...
It requires a little bit knowledge of both.
Please note the real power of tis approach lies in the "direct initialization" and "delayed activation". This is necessary as you might not know when (during loading of the page) these object are live. The delay helps overriding the proper objects.
This might be a long shot, but sometimes I need a c# property/value from the server side displaying or manipulated on the client side.
c# code behind page
public string Name {get; set;}
JavaScript on Aspx page
var name = '<%=Name%>';
Populating to client side is generally easier, depending on your issue. Just a thought!