Given this URL:
http://www.dreamincode.net/forums/xml.php?showuser=1253
How can I download the resulting XML file and have it loaded to memory so I can grab information from it using Linq?
Thanks for the help.
Why complicate things? This works:
var xml = XDocument.Load("http://www.dreamincode.net/forums/xml.php?showuser=1253");
Load string:
string xml = new WebClient().DownloadString(url);
Then load into XML:
XDocument doc = XDocument.Parse(xml);
For example:
[Test]
public void TestSample()
{
string url = "http://www.dreamincode.net/forums/xml.php?showuser=1253";
string xml;
using (var webClient = new WebClient())
{
xml = webClient.DownloadString(url);
}
XDocument doc = XDocument.Parse(xml);
// in the result profile with id name is 'Nate'
string name = doc.XPathSelectElement("/ipb/profile[id='1253']/name").Value;
Assert.That(name, Is.EqualTo("Nate"));
}
You can use the WebClient class:
WebClient client = new WebClient ();
Stream data = client.OpenRead ("http://example.com");
StreamReader reader = new StreamReader (data);
string s = reader.ReadToEnd ();
Console.WriteLine (s);
data.Close ();
reader.Close ();
Though using DownloadString is easier:
WebClient client = new WebClient ();
string s = client.DownloadString("http://example.com");
You can load the resulting string into an XmlDocument.
Related
I am trying to download whatever is in between <code></code> tags on website, unfortunately, selecting nodes "//code" return null. I don't know why. This is my code:
public void TAF_download()
{
var html = #"https://www.aviationweather.gov/taf/data?ids=KDEN&format=raw&metars=off&layout=off/";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var TAF = doc.DocumentNode.SelectSingleNode("//code");
Console.WriteLine(TAF.OuterHtml);
}
The argument to HtmlDocument.LoadHtml(string html) needs to be HTML, not a URL.
You may try this (no exception handling included):
public void TAF_download()
{
var url = #"https://www.aviationweather.gov/taf/data?ids=KDEN&format=raw&metars=off&layout=off/";
string html;
var request = WebRequest.CreateHttp(url);
using (var response = request.GetResponse())
using (var reader = new StreamReader(response.GetResponseStream()))
{
html = reader.ReadToEnd();
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var TAF = doc.DocumentNode.SelectSingleNode("//code");
Console.WriteLine(TAF.OuterHtml);
}
There is also an HtmlAgilityPack.HtmlWeb class that supports downloading a URL, but I generally don't use it myself (I actually forgot about it).
For example:
public void TAF_download()
{
var url = #"https://www.aviationweather.gov/taf/data?ids=KDEN&format=raw&metars=off&layout=off/";
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(url);
var TAF = doc.DocumentNode.SelectSingleNode("//code");
Console.WriteLine(TAF.OuterHtml);
}
With that said, you should look for a better data source, one that doesn't require scraping HTML... maybe one of the options listed here https://www.aviationweather.gov/dataserver
I've got a problem with the XDocument class. I am loading information of online XML API, for my first WP8.1 App, like this:
try
{
var xmlDoc = XDocument.Load(_url);
return xmlDoc;
}
catch (XmlException)
{
HttpClient http = new HttpClient();
HttpResponseMessage response = await http.GetAsync(new Uri(_url));
var webresponse = await response.Content.ReadAsStringAsync();
var content = XmlCharacterWhitelist(webresponse);
var xmlDoc = XDocument.Parse(content);
return xmlDoc;
}
But both ways are returning the wrong encoding. For example German umlauts are shown the wrong way. Every XML file I load has
xml version="1.0" encoding="utf-8"
in the top line. Any ideas?
Rather than read the data into a byte array and decoding it yourself, just read it as a stream and let XDocument.Load detect the encoding from the data:
using (HttpClient http = new HttpClient())
{
using (var response = await http.GetAsync(new Uri(_url)))
{
using (var stream = await response.Content.ReadAsStreamAsync())
{
return XDocument.Load(stream);
}
}
}
I fixed it by doing this:
HttpClient http = new HttpClient();
var response = await http.GetAsync(new Uri(_url));
var buffer = await response.Content.ReadAsBufferAsync();
byte[] byteArray = buffer.ToArray();
string content = Encoding.UTF8.GetString(byteArray, 0, byteArray.Length);
var xmlDoc = XDocument.Parse(content);
return xmlDoc;
Using an XmlReader should resolve issue
string content = "your xml here";
StringReader sReader = new StringReader(content);
XmlTextReader xReader = new XmlTextReader(sReader);
XDocument xmlDoc = (XDocument)XDocument.ReadFrom(xReader);
I've been searching a simple web crawler, and i need search an elements inside my StreamBuilder or string. Example, i need get all content inside an div with id "bodyDiv". Which tool helper me with this?
private static string GetPage(string url)
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.UserAgent = "Simple crawler";
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string htmlText = reader.ReadToEnd();
return htmlText;
}
I would use HtmlAgilityPack
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlText);
var div = doc.DocumentNode.SelectSingleNode("//div[#id='bodyDiv']");
if(div!=null)
{
var yourtext = div.InnerText;
}
I have a C# Windows Forms app that launches a webpage based on some criteria.
Now I would like my app to automatically copy all the text from that page (which is in CSV format) and paste and save it in notepad.
Here is a link to an example of the data that needs to be copied:
http://www.wunderground.com/history/airport/FAJS/2012/10/28/DailyHistory.html?req_city=Johannesburg&req_state=&req_statename=South+Africa&format=1
Any Help will be appreciated.
You can use the new toy HttpClient from .NET 4.5, example how to get google page:
var httpClient = new HttpClient();
File.WriteAllText("C:\\google.txt",
httpClient.GetStringAsync("http://www.google.com")
.Result);
http://msdn.microsoft.com/en-us/library/fhd1f0sw.aspx combined with http://www.dotnetspider.com/resources/21720-Writing-string-content-file.aspx
public static void DownloadString ()
{
WebClient client = new WebClient();
string reply = client.DownloadString("http://www.wunderground.com/history/airport/FAJS/2012/10/28/DailyHistory.html?req_city=Johannesburg&req_state=&req_statename=South+Africa&format=1");
StringBuilder stringData = new StringBuilder();
stringData = reply;
FileStream fs = new FileStream(#"C:\Temp\tmp.txt", FileMode.Create);
byte[] buffer = new byte[stringData.Length];
for (int i = 0; i < stringData.Length; i++)
{
buffer[i] = (byte)stringData[i];
}
fs.Write(buffer, 0, buffer.Length);
fs.Close();
}
Edit Adil uses the WriteAllText method, which is even better. So you will get something like this:
WebClient client = new WebClient();
string reply = client.DownloadString("http://www.wunderground.com/history/airport/FAJS/2012/10/28/DailyHistory.html?req_city=Johannesburg&req_state=&req_statename=South+Africa&format=1");
System.IO.File.WriteAllText (#"C:\Temp\tmp.txt", reply);
Simple way: use WebClient.DownloadFile and save as a .txt file:
var webClient = new WebClient();
webClient.DownloadFile("http://www.google.com",#"c:\google.txt");
You need WebRequest to read the stream of and save to string to text file. You can use File.WriteAllText to write it to file.
WebRequest request = WebRequest.Create ("http://www.contoso.com/default.html");
request.Credentials = CredentialCache.DefaultCredentials;
HttpWebResponse response = (HttpWebResponse)request.GetResponse ();
Console.WriteLine (response.StatusDescription);
Stream dataStream = response.GetResponseStream ();
StreamReader reader = new StreamReader (dataStream);
string responseFromServer = reader.ReadToEnd ();
System.IO.File.WriteAllText (#"D:\path.txt", responseFromServer );
You may use a webclient to do this:
System.Net.WebClient wc = new System.Net.WebClient();
byte[] raw = wc.DownloadData("http://www.wunderground.com/history/airport/FAJS/2012/10/28/DailyHistory.html?req_city=Johannesburg&req_state=&req_statename=South+Africa&format=1");
string webData = System.Text.Encoding.UTF8.GetString(raw);
then the string webData contains the complete text of the webpage
I used this solution to read and parse a RSS feed from an ASP.NET website. This worked perfectly. However, when trying it on another site, an error occurs because "System does not support 'utf8' encoding." Below I have included an extract of my code.
private void Form1_Load(object sender, EventArgs e)
{
lblFeed.Text = ProcessRSS("http://buypoe.com/external.php?type=RSS2", "ScottGq");
}
public static string ProcessRSS(string rssURL, string feed)
{
WebRequest request = WebRequest.Create(rssURL);
WebResponse response = request.GetResponse();
StringBuilder sb = new StringBuilder("");
Stream rssStream = response.GetResponseStream();
XmlDocument rssDoc = new XmlDocument();
rssDoc.Load(rssStream);
XmlNodeList rssItems = rssDoc.SelectNodes("rss/channel/item");
string title = "";
string link = "";
...
The error occurs at "rssDoc.Load(rssStream);". Any help in encoding the xml correctly would be appreciated.
use the following code for encoding
System.IO.StreamReader stream = new System.IO.StreamReader
(response.GetResponseStream(), System.Text.Encoding.GetEncoding("utf-8"));