webClient.DownloadString - extracting source code - not all elements are being picked up - c#

The following script is supposed to write the source code of the URL to a .txt file:
private void buttonFetch_Click(object sender, EventArgs e)
{
using (WebClient webClient = new WebClient())
{
string itemSelection = comboBoxItem.Text.ToString();
string s;
if (itemSelection == "item1")
{
s = webClient.DownloadString("http://www.ebay.com/sch/i.html?_odkw=batman+action+figure&_ftrv=1&_sadis=200&_ipg=200&_sop=12&LH_SALE_CURRENCY=0&_osacat=0&_from=R40&_dmd=1&_ftrt=901&_trksid=p2045573.m570.l1313.TR0.TRC0.XBatman+Arkham+Origins+Series+1+&_nkw=Batman+Arkham+Origins+Series+1+&_sacat=0");
}
else
{
s = webClient.DownloadString("http://www.othersite.com");
}
string fixedString = s.Replace("\n", "\r\n");
System.IO.File.WriteAllText(#"C:\Users\Alex\Dropbox\Personal Projects\loadSheet.txt", fixedString);
MessageBox.Show("New items are ready to browse.","",
MessageBoxButtons.OK, MessageBoxIcon.Asterisk);
}
}
Note: both URLs produce a different Ebay search result.
The element that is not being picked up is <span class="fee">
However, when I paste the URL into Chrome and right-click --> View Page Source, I can see this element in the source code.
AFAIK, all of the rest of the elements are being pulled into the text file.
Is there any way to ensure this element will be picked up?
Note - I had to customize the search results in order to see that span in the source code. To do this, click View --> then click Customize. Then check Shipping Cost. The span should then appear in the source code when you manually view it through Chrome. However, I cannot get the span to display in the code when I use webClient.Download.

Related

Scrape data from div in Windows.Form

I am new in c# programming. I am trying to scrape data from div (I want to display temperature from web page in Forms application).
This is my code:
private void btnOnet_Click(object sender, EventArgs e)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
var temperatura = doc.DocumentNode.SelectSingleNode("/html/body/div[1]/div[3]/div/section/div/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]");
onet.Text = temperatura.InnerText;
}
This is the exception:
System.NullReferenceException:
temperatura was null.
You can use this:
public static bool TryGetTemperature(HtmlAgilityPack.HtmlDocument doc, out int temperature)
{
temperature = 0;
var temp = doc.DocumentNode.SelectSingleNode(
"//div[contains(#class, 'temperature')]/div[contains(#class, 'temp')]");
if (temp == null)
{
return false;
}
var text = temp.InnerText.EndsWith("°") ?
temp.InnerText.Substring(0, temp.InnerText.Length - 5) :
temp.InnerText;
return int.TryParse(text, out temperature);
}
If you use XPath, you can select with more precission your target. With your query, a bit change in the HTML structure, your application will fail. Some points:
// is to search in any place of document
You search any div that contains a class "temperature" and, inside that node:
you search a div child with "temp" class
If you get that node (!= null), you try to convert the degrees (removing '°' before)
And check:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
if (TryGetTemperature(doc, out int temperature))
{
onet.Text = temperature.ToString();
}
UPDATE
I updated a bit the TryGetTemperature because the degrees are encoded. The main problem is the HTML. When you request the source code you get some HTML that browser update later dynamically. So the HTML that you get is not valid for you. It doesn't contains the temperature.
So, I see two alternatives:
You can use a browser control (in Common Controls -> WebBrowser, in the Form Tools with the Button, Label...), insert into your form and Navigate to the page. It's not difficult, but you need learn some things: wait to events for page downloaded and then get source code from the control. Also, I suppose you'll want to hide the browser control. Be carefully, sometimes the browser doesn't works correctly if you hide. In that case, you can use a visible Form outside desktop and manage activate events to avoid activate this window. Also, hide from Task Window (Alt+Tab). Things become harder in this way but sometimes is the only way.
The simple way is search the location that you want (ex: Madryt) and look in DevTools the request done (ex: https://pogoda.onet.pl/prognoza-pogody/madryt-396099). Use this Url and you get a valid HTML.

How to Save Html data from a website to a text file using Xamarin forms and C#

I'm using C# and Xamarin forms to create a phone app that (when a button is pressed) will pull specific html data from a website in and save it into a text file (that the program can read from again later). I started with the tutorial in this video: https://www.youtube.com/watch?v=zvp7wvbyceo if you want to see what I started out with, and here's the code I have so far made using this video https://www.youtube.com/watch?v=wwPx8QJn9Kk, in the the "AboutViewModel.cs" file created in the video:
Image link because this is a new account i guess and i cant embed images or something
Paste of the code itself (but the image gives you a better look at everything):
private Task WebScraper()
{
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("https://www.flightview.com/airport/DAB-Daytona_Beach-FL/");
foreach (var item in doc.DocumentNode.SelectNodes("//td[#class='c1']"))
{
var itemstring = item;
File.WriteAllText("AirportData.txt", itemstring);
}
return Task.CompletedTask;
}
public ICommand OpenWebCommand { get; }
public ICommand WebScraperCommand { get; }
}
}
The only error i'm getting right now is "Cannot convert 'HtmlAgilityPack.HtmlNode' to 'string'" Which i'm working on fixing but I don't think this is the best solution so anything you have is useful. Thanks :)
HtmlNode is an object, not a simple string. You probably want to use the OuterHtml property, but consult the docs to see if that is the right fit for your use case
string output = string.Empty;
foreach (var item in doc.DocumentNode.SelectNodes("//td[#class='c1']"))
{
output += item.OuterHtml;
}
File.WriteAllText("AirportData.txt", output);
note that you need to specify a path to a writable folder, the root folder of the app is not writable. See https://learn.microsoft.com/en-us/xamarin/xamarin-forms/data-cloud/data/files?tabs=windows

Reading text and variables from text file c#

I have the following code which tries to read data from a text file (so users can modify easily) and auto format a paragraph based on a the words in the text document plus variables in the form. I have the file "body" going into a field. my body text file has the following data in it
"contents: " + contents
I was hoping based on that to get
contents: Item 1, 2, etc.
based on my input. I only get exactly whats in the text doc despite putting "". What am I doing wrong? I was hoping to get variables in addition to my text.
string readSettings(string name)
{
string path = System.Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments) + "/Yuneec_Repair_Inv";
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(path + "/" + name + ".txt"))
{
string data = sr.ReadToEnd();
return data;
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The settings file for " + name + " could not be read:");
Console.WriteLine(e.Message);
string content = "error";
return content;
}
}
private void Form1_Load(object sender, EventArgs e)
{
createSettings("Email");
createSettings("Subject");
createSettings("Body");
yuneecEmail = readSettings("Email");
subject = readSettings("Subject");
body = readSettings("Body");
}
private void button2_Click(object sender, EventArgs e)
{
bodyTextBox.Text = body;
}
If you want to provide the ability for your users to customize certain parts of the text you should use some "indicator" that you know before hand, that can be searched and parsed out, something like everything in between # and # is something you will read as a string.
Hello #Mr Douglas#,
Today is #DayOfTheWeek#.....
At that point your user can replace whatever they need in between the # and # symbols and you read that (for example using Regular Expressions) and use that as your "variable" text.
Let me know if this is what you are after and I can provide some C# code as an example.
Ok, this is the example code for that:
StreamReader sr = new StreamReader(#"C:\temp\settings.txt");
var set = sr.ReadToEnd();
var settings = new Regex(#"(?<=\[)(.*?)(?=\])").Matches(set);
foreach (var setting in settings)
{
Console.WriteLine("Parameter read from settings file is " + setting);
}
Console.WriteLine("Press any key to finish program...");
Console.ReadKey();
And this is the source of the text file:
Hello [MrReceiver],
This is [User] from [Company] something else, not very versatile using this as an example :)
[Signature]
Hope this helps!
When you read text from a file as a string, you get a string of text, nothing more.
There's no part of the system which assumes it's C#, parses, compiles and executes it in the current scope, casts the result to text and gives you the result of that.
That would be mostly not what people want, and would be a big security risk - the last thing you want is to execute arbitrary code from outside your program with no checks.
If you need a templating engine, you need to build one - e.g. read in the string, process the string looking for keywords, e.g. %content%, then add the data in where they are - or find a template processing library and integrate it.

C# Method - Publish Entire Sitecore Site

I have an odd request. I am wondering if it is possible to have your C# solution file publish the entire Site from the master database to the web database. I am in my development environment and the amount of items in Sitecore is changing daily as I am working with multiple people. This is not something that is going to be used to Production Content Management or Content Delivery. Purely development.
Is it possible to trigger a full Site publish in C# just like pressing the publish button in the content editor? And what would the code look like?
/sitecore/shell/Applications/Publish.aspx
I assume this C# method would work with Sitecore.Publishing.PublishManager?
Thanks!
We also use that on our projects...
We have that placed in a regular .aspx page. I hope that helps:
protected void Page_Load(object sender, EventArgs e)
{
PublishMode publishMode = PublishMode.Full;
using (new Sitecore.SecurityModel.SecurityDisabler())
{
var webDb = Sitecore.Configuration.Factory.GetDatabase("web");
var masterDb = Sitecore.Configuration.Factory.GetDatabase("master");
try
{
foreach (Language language in masterDb.Languages)
{
//loops on the languages and do a full republish on the whole sitecore content tree
var options = new PublishOptions(masterDb, webDb, publishMode, language, DateTime.Now) { RootItem = masterDb.Items["/sitecore"], RepublishAll = true, Deep = true };
var myPublisher = new Publisher(options);
myPublisher.Publish();
}
}
catch (Exception ex)
{
Sitecore.Diagnostics.Log.Error("Could not publish", ex);
}
}
}

Good ASP.NET method for checking, reading and returning file contents

I found a good way to check if a file exists and read the contents if it does, but for some reason I can't create a method out of it.
Here's what I have so far:
<script runat="server">
void Page_Load(Object s, EventArgs e) {
lblFunction.Text = mwbInclude("test.txt");
}
string mwbInclude(string fileName) {
string inc = Server.MapPath("/extra/include/" + Request["game"] + "/" + fileName);
string valinc;
if(System.IO.File.Exists(inc))
{
valinc = System.IO.File.ReadAllText(inc);
}
return valinc;
}
</script>
I wish I could provide more info, but the server this is on doesn't show any feedback on errors, just a 404 page.
I think
valinc = Response.Write(System.IO.File.ReadAllText(inc));
should be
valinc = System.IO.File.ReadAllText(inc);
Why are you setting the Text property and calling Response.Write? Do you want to render the text as a label, or as the whole response?
If you're getting a 404, it's because your page isn't being found, not because there's a problem with the script itself. Have you tried ripping out all of the code and just sticking in some HTML tags as a sanity check?

Categories