I'm trying to scrape a website to get data off of it. So far I got it to at least connect to the website, but now when I try to set the text of a textbox with the data, I just get a bunch of:
HtmlAgilityPack.HtmlNodeCollection
There are the same number of HtmlAgilityPack.HtmlNodeCollection as there is data. Here is my code ( it is a bit sloppy I know ):
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Text.RegularExpressions;
using System.Windows.Forms;
using System;
using HtmlAgilityPack;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
string choice;
public Form1()
{
InitializeComponent();
}
public void comboBox1_SelectedIndexChanged(object sender, System.EventArgs e)
{
}
public void button1_Click(object sender, System.EventArgs e)
{
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
string urlToLoad = "http://www.nbcwashington.com/weather/school-closings/";
HttpWebRequest request = HttpWebRequest.Create(urlToLoad) as HttpWebRequest;
request.Method = "GET";
Console.WriteLine(request.RequestUri.AbsoluteUri);
WebResponse response = request.GetResponse();
htmlDoc.Load(response.GetResponseStream(), true);
if (htmlDoc.DocumentNode != null)
{
var articleNodes = htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p");
if (articleNodes != null && articleNodes.Any())
{
foreach (var articleNode in articleNodes)
{
textBox1.AppendText(htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p").ToString());
}
}
}
Console.ReadLine();
}
private void listBox1_SelectedIndexChanged(object sender, System.EventArgs e)
{
choice = listBox1.SelectedItem.ToString();
}
}
}
So what am I missing / doing wrong here? The data should return something like:
Warren County Public Schools Closed
Washington Adventist University Closing at Noon
Thanks for looking at this.
Nevermind, found the issue. I guess I was trying to grab the document node instead of the inner text... Here is code just in case anyone wants it.
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Text.RegularExpressions;
using System.Windows.Forms;
using System;
using HtmlAgilityPack;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
string choice;
public Form1()
{
InitializeComponent();
}
public void comboBox1_SelectedIndexChanged(object sender, System.EventArgs e)
{
}
public void button1_Click(object sender, System.EventArgs e)
{
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
string urlToLoad = "http://www.nbcwashington.com/weather/school-closings/";
HttpWebRequest request = HttpWebRequest.Create(urlToLoad) as HttpWebRequest;
request.Method = "GET";
Console.WriteLine(request.RequestUri.AbsoluteUri);
WebResponse response = request.GetResponse();
htmlDoc.Load(response.GetResponseStream(), true);
if (htmlDoc.DocumentNode != null)
{
var articleNodes = htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p");
if (articleNodes != null && articleNodes.Any())
{
int k = 0;
foreach (var articleNode in articleNodes)
{
textBox1.AppendText(articleNode.InnerText + "\n");
}
}
}
Console.ReadLine();
}
private void listBox1_SelectedIndexChanged(object sender, System.EventArgs e)
{
choice = listBox1.SelectedItem.ToString();
}
}
}
Since articleNodes already contain the nodes you're interested in, there's no need to call SelectNodes() again inside a loop.
Also, you shouldn't need to check for null, since articleNodes is a collection. It might be empty, but shouldn't be null.
Try this, accessing the InnerHtml (or InnerText) property instead:
var articleNodes = htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/p");
var result = articleNodes.Select(x => x.InnerHtml.Replace("<br><span>", " ")
.Replace(" </span>", "")).ToList();
Related
The goal of this project in C# is to browse for a txt file in each box (that coding works) and hit submit and each side will run to select the top ten repeated words in each listbox. I have got most of the code to work but I can only get book 1 side to run and not book 2. Attached is the code as well as the image to help understand! Using Windows Desktop (.NET) application.
using System;
using System.Collections;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace Asynchronous_Program
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void btnExit_Click(object sender, EventArgs e) => Close();
private async void btnSubmit_Click(object sender, EventArgs e)
{
listBox1.Items.Clear();
listBox2.Items.Clear();
List<Task> tasks = new List<Task>();
var file = openFileDialog1.FileName;
var wordTask = ReadBook(file);
tasks.Add(wordTask);
await Task.WhenAll(tasks);
//Other operations after this has finished
}
private async Task ReadBook(string filename)
{
var lines = await Task.FromResult(File.ReadAllLinesAsync(filename));
var arrayLinesWithoutPunctuation =
lines.Result.Where(x => x != String.Empty)
.AsParallel()
.Select(x => x.ToLower().Trim().Replace(",", "").Replace("{", ""));
var arrayOfWords = arrayLinesWithoutPunctuation.SelectMany(x => x.Split(" "));
Dictionary<string, int> wordDic = new Dictionary<string, int>();
arrayOfWords.ToList().ForEach(x =>
{
if (wordDic.ContainsKey(x))
{
wordDic[x] += 1;
}
else
{
wordDic.Add(x, 1);
}
});
var top10 = wordDic.Where(x =>
!string.IsNullOrWhiteSpace(x.Key)).OrderByDescending(x => x.Value).Take(10);
foreach(var word in top10)
{
listBox1.Items.Add(word.Key + " - " + word.Value);
listBox2.Items.Add(word.Key + " - " + word.Value);
}
}
private void btnBrowse_Click(object sender, EventArgs e)
{
DialogResult result = this.openFileDialog1.ShowDialog();
if (result == DialogResult.OK)
{
txtFileName.Text = openFileDialog1.FileName.Split("\\").LastOrDefault();
}
}
private void btnBrowse2_Click(object sender, EventArgs e)
{
DialogResult result = this.openFileDialog2.ShowDialog();
if (result == DialogResult.OK)
{
txtFileName2.Text = openFileDialog2.FileName.Split("\\").LastOrDefault();
}
}
}
}
why can't I add a new item/object to my list in button (or combobox etc.) events? I mean, the events don't see my list if it's outside of the brackets...it's underlined in red... can you help me?
long story short: i want to add a new object by clicking the button
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Xml.Serialization;
using System.IO;
namespace Samochody
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
List<Samochod> ListaSamochodow = new List<Samochod>();
comboBox1.DataSource = ListaSamochodow;
comboBox1.DisplayMember = "Marka";
XmlRootAttribute oRootAttr = new XmlRootAttribute();
XmlSerializer oSerializer = new XmlSerializer(typeof(List<Samochod>), oRootAttr);
StreamWriter oStreamWriter = null;
oStreamWriter = new StreamWriter("samochody.xml");
oSerializer.Serialize(oStreamWriter, ListaSamochodow);
}
private void button1_Click(object sender, EventArgs e)
{
try
{
ListaSamochodow.Add(new Samochod(textBox1.Text, textBox2.Text, Convert.ToInt32(textBox3.Text)));
}
catch (Exception oException)
{
Console.WriteLine("Aplikacja wygenerowała następujący wyjątek: " + oException.Message);
}
}
I think you should instantiate your list globally and not under Form1_Load event.
that way it will be accessible all around your class (the window in this case).
This seems to work fine. You might need to set your textboxes to contain valid values when you start the form. also make sure that you make the list visible throughout your form1 class.
namespace Samochody
{
public partial class Form1 : Form
{
// make sure your list looks like this, created outside your functions.
List<Samochod> ListaSamochodow = new List<Samochod>();
public Form1()
{
InitializeComponent();
label1.Text = "the amount in your list is " + ListaSamochodow.Count.ToString();
textBox1.Text = "string here";
textBox2.Text = "string here";
textBox3.Text = "100";
}
private void Form1_Load(object sender, EventArgs e)
{
XmlRootAttribute oRootAttr = new XmlRootAttribute();
XmlSerializer oSerializer = new XmlSerializer(typeof(List<Samochod>), oRootAttr);
StreamWriter oStreamWriter = null;
oStreamWriter = new StreamWriter("samochody.xml");
oSerializer.Serialize(oStreamWriter, ListaSamochodow);
}
private void button1_Click_1(object sender, EventArgs e)
{
Samochod s = new Samochod(textBox1.Text, textBox2.Text, Convert.ToInt32(textBox3.Text));
ListaSamochodow.Add(s);
label1.Text = "the amount in your list is " + ListaSamochodow.Count.ToString();
}
}
}
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Net;
using System.IO;
using HtmlAgilityPack;
using mshtml;
using System.Text.RegularExpressions;
namespace Extract_Images
{
public partial class Form1 : Form
{
private string[] linkstoextract;
private int numberoflinks;
private int currentLinkNumber = 0;
private string mainlink;
private WebClient client;
private WebBrowser webBrowser1;
private string htmlCode;
private bool pagesorimages = false;
public Form1()
{
InitializeComponent();
webBrowser1 = new WebBrowser();
webBrowser1.ScriptErrorsSuppressed = true;
webBrowser1.DocumentCompleted += webBrowser1_DocumentCompleted;
label1.Text = "Number of links: ";
mainlink = "http://www.test.com/";
numberoflinks = 13;
backgroundWorker1.RunWorkerAsync();
}
private void ProcessNextLink()
{
if (currentLinkNumber < numberoflinks)
{
currentLinkNumber++;
string linktonav = mainlink + "index"+currentLinkNumber.ToString() + ".html";
pagesorimages = false;
backgroundWorker1.ReportProgress(0,currentLinkNumber);
webBrowser1.Navigate(linktonav);
}
}
int count = 0;
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
mshtml.HTMLDocument objHtmlDoc = (mshtml.HTMLDocument)webBrowser1.Document.DomDocument;
string pageSource = objHtmlDoc.documentElement.innerHTML;
List<string> links = new List<string>();
string[] hrefs = this.webBrowser1.Document.Links.Cast<HtmlElement>()
.Select(a => a.GetAttribute("href")).Where(h => h.Contains(".jpg")).ToArray();
foreach(string a in hrefs)
{
using (WebClient client = new WebClient())
{
client.DownloadFile(a, #"C:\Images\file" + count + ".jpg");
}
count ++;
pagesorimages = true;
backgroundWorker1.ReportProgress(0, count);
}
//ProcessNextLink();
}
private void Form1_Load(object sender, EventArgs e)
{
}
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
ProcessNextLink();
}
private void backgroundWorker1_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
if (pagesorimages == false)
{
label1.Text = e.UserState.ToString();
}
if (pagesorimages == true)
{
label2.Text = e.UserState.ToString();
}
}
private void backgroundWorker1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
}
}
}
The exception is on the second ReportProgress:
backgroundWorker1.ReportProgress(0, count);
This operation has already had OperationCompleted called on it and further calls are illegal
What i want to do is to report first the current page number in this case it's 1 to label1.
And then to report the number of downloaded images in this page to label2.
Then to move to the next page with the method ProcessNextLink(); and again report the page number it should be 2 and then to report the number of images downloaded in page 2.
But i'm getting this exception already on the first page.
It was working fine without the backgroundworker in the event webBrowser1_DocumentCompleted i called ProcessNextLink(); in the bottom and it was working fine. But with the backgroundworker it's not working.
Read them back in app constructor and also maybe in other places in program.
I have a new form i created with some checkboxes:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.IO;
namespace Options
{
public partial class OptionsMenuForm : Form
{
public OptionsMenuForm()
{
InitializeComponent();
}
private void checkBox1_CheckedChanged(object sender, EventArgs e)
{
if (checkBox1.Checked)
{
Settings.downloadonstart = true;
}
else
{
Settings.downloadonstart = false;
}
}
private void checkBox2_CheckedChanged(object sender, EventArgs e)
{
if (checkBox2.Checked)
{
Settings.loadonstart = true;
}
else
{
Settings.loadonstart = false;
}
}
private void checkBox3_CheckedChanged(object sender, EventArgs e)
{
if (checkBox3.Checked)
{
Settings.startminimized = true;
}
else
{
Settings.startminimized = false;
}
}
private void checkBox4_CheckedChanged(object sender, EventArgs e)
{
if (checkBox4.Checked)
{
Settings.displaynotifications = true;
}
else
{
Settings.displaynotifications = false;
}
}
}
}
Now i want to save each time the state of one/any of the checkboxes.
I also added a class i'm using th pass the variables between the new form and form1:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Options
{
class Settings
{
public static bool downloadonstart;
public static bool loadonstart;
public static bool startminimized;
public static bool displaynotifications;
}
}
Now how can i use this in form1 by saving the settings to a text file ?
For example in the text file the content will be something like:
CheckBox1 = true
CheckBox2 = false
CheckBox3 = false
CheckBox4 = true
And then if i change the state of one of them it will write it in the text file:
CheckBox1 = false
CheckBox2 = false
CheckBox3 = true
CheckBox4 = true
In form1 top i added
string settingsFile = "settings.txt";
string settingsFileDirectory = "\\settings";
StreamWriter writetosettingsfile;
Then in constructor
settingsFileDirectory = Path.GetDirectoryName(Application.LocalUserAppDataPath) +
settingsFileDirectory;
if (!Directory.Exists(settingsFileDirectory))
{
Directory.CreateDirectory(settingsFileDirectory);
}
I know the app it self have a settings in the properties but i wanted to use a text file this time since i have many settings and i might have more later.
Or using the settings in the app in properties i did:
But now how do i use with it in my program to save every checkbox state in the new form and then using it in form1 ?
You can use json.net
Something like this to save the data
private void Form1_Load(object sender, EventArgs e)
{
checkBox1.CheckedChanged += checkBox_CheckedChanged;
checkBox2.CheckedChanged += checkBox_CheckedChanged;
checkBox3.CheckedChanged += checkBox_CheckedChanged;
checkBox4.CheckedChanged += checkBox_CheckedChanged;
}
private void checkBox_CheckedChanged(object sender, EventArgs e)
{
var settings = new Settings();
settings.downloadonstart = checkBox1.Checked;
settings.loadonstart = checkBox2.Checked;
settings.startminimized = checkBox3.Checked;
settings.displaynotifications = checkBox4.Checked;
File.WriteAllText(#"c:\configfile.json", JsonConvert.SerializeObject(settings));
}
You can read the file like this
Settings settings = JsonConvert.DeserializeObject<Settings>(File.ReadAllText(#"c:\configfile.json"));
Documentation:
http://www.newtonsoft.com/json/help/html/Introduction.htm
public OptionsMenuForm()
{
InitializeComponent();
//Read the settings from a file with comma delimited 1's and 0's or whatever you like
string[] values = System.IO.File.ReadAllText("c:\temp\a.txt").Split(',');
checkBox1.Checked = Convert.ToBoolean(Convert.ToInt32(values[0]));
checkBox2.Checked = Convert.ToBoolean(Convert.ToInt32(values[1]));;
//On checkbox changes save the settings
checkBox1.CheckedChanged +=SaveSettings;
checkBox2.CheckedChanged +=SaveSettings;
}
public void SaveSettings(object sender,EventArgs e)
{
StringBuilder sbValues = new StringBuilder();
int i = 0;
i = checkBox1.Checked ? 1 : 0;
sbValues.Append(i.ToString() + ",");
i = checkBox2.Checked ? 1 : 0;
sbValues.Append(i.ToString() + ",");
System.IO.File.WriteAllText("c:\temp\a.txt",sbValues.ToString());
}
It is recommended that you use XML when using the .NET framework.
public static void ConvertStringToXmlList()
{
XElement xml = new XElement
(
"Items",
(
from x in [YourList] select new XElement("Item", x)
)
);
XDocument doc = new XDocument
(
xml
);
doc.Save(Environment.CurrentDirectory + "/settings.xml");
}
var xmlReader = new XmlTextReader("settings.xml");
while (xmlReader.Read())
{
switch (reader.NodeType)
{
case [nodetype]:
// Code here
break;
}
}
Also, instead of creating the same event for all four checkbox's, just use 1.
Inside of the single event put:
ckCheckBox1.Checked = !ckCheckBox1.Checked;
All of a sudden my form window has begun to close as soon as the application is launched. There's nothing in the output window that gives a hint as to what could be causing it and there are no errors thrown at me either. Does anybody have any ideas?
I've provided to form's class.
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
namespace ProjectBoardManagement {
public partial class CreateBoard : Form {
Functions funcs = new Functions();
public CreateBoard() {
InitializeComponent();
}
private void CreateBoardButton_Click(object sender, EventArgs e) {
String BoardName = BoardNameText.Text;
String Pages = "";
String Labels = "";
foreach (ListViewItem i in PageNameList.Items) {
Pages = (Pages + i.Name.ToString() + ",");
}
foreach (ListViewItem i in LabelNameList.Items) {
Labels = (Labels + i.Name.ToString() + ",");
}
String BoardFile = ("board_" + BoardName + ".txt");
funcs.SaveSetting(BoardFile, "name", BoardName);
funcs.SaveSetting(BoardFile, "pages", Pages);
funcs.SaveSetting(BoardFile, "labels", Labels);
FormManagement.CreateBoard.Hide();
FormManagement.BoardList.LoadBoardList();
}
private void PageNameButtonAdd_Click(object sender, EventArgs e) {
String pagename = PageNameText.Text;
if (pagename != "") {
PageNameList.Items.Add(pagename);
}
PageNameText.Text = "";
}
private void LabelNameButtonAdd_Click(object sender, EventArgs e) {
String labelname = LabelNameText.Text;
if (labelname != "") {
LabelNameList.Items.Add(labelname);
}
LabelNameText.Text = "";
}
}
}
Obvious first thing to do - run it in Debug mode and stop execution on all exceptions. This should give you enough information on how to go from there.
Otherwise Functions funcs = new Functions(); looks suspicious.