Get xml data from external website

Get xml data from external website - c#

I have an app that gets a daily feed from an external rss feed (this data is in xml).
I have a search form that allows users to search my database, however, i would like to use the same searchstring that users enter on my site to search this rss feed, then extract only whats relevant, and display that on my website.
I have been looking at reading xml file using linq using this code:
XElement xelement = XElement.Load("..\\..\\Employees.xml");
IEnumerable<XElement> employees = xelement.Elements();
Console.WriteLine("List of all Employee Names along with their ID:");
foreach (var employee in employees)
{
Console.WriteLine("{0} has Employee ID {1}",
employee.Element("Name").Value,
employee.Element("EmpId").Value);
}
the problem i have with this is, where in the code do i use a url instead of a file name:
XElement xelement = XElement.Load("..\\..\\Employees.xml");
should be:
XElement xelement = XElement.Load("http://www.test.com/file.xml");
I am thinking maybe i should store the content into an array or something, and check to make sure if the searchString is in it?
I am not sure how to proceed and whats best to use, maybe i shouldn't even use linq??
so using the responses below here is what i have done:
public void myXMLTest()
{
WebRequest request = WebRequest.Create("http://www.test.com/file.xml");
WebResponse response = request.GetResponse();
Stream dataStream = response.GetResponseStream();
XElement xelement = XElement.Load(dataStream);
IEnumerable<XElement> employees = xelement.Elements();
MessageBox.Show("List of all Employee Names along with their ID:");
foreach (var employee in employees)
{
MessageBox.Show(employee.Name.ToString());
/* the above message box gives me this:
{http://www.w3.org/2005/Atom}id
{http://www.w3.org/2005/Atom}name
{http://www.w3.org/2005/Atom}title
etc
*/
MessageBox.Show(employee.Element("name").Value);//this gives me error
}
}

You're going to have to do a bit more work than just providing it a URL.
Instead, you're going to need to get the XML file using the WebRequest class. Provided the request is successful, you can then turn around use that as your argument to XElement.Load.
Example (illustrative only, for the love of Pete add some error handling):
WebRequest request = WebRequest.Create("http://www.test.com/file.xml");
WebResponse response = request.GetResponse();
Stream dataStream = response.GetResponseStream();
XElement doc = Xelement.Load(dataStream);

Related

Parse Xml response from a get method in C# using Linq

I have following url which returns me XML response in c#:
http://www.bnr.ro/files/xml/years/nbrfxrates2017.xml
Now, i want to retrieve the currency in Euro for my date which is present as a paramater in my function. I want to use Linq but i have some problems
My function:
public static void getXml(String year, String data)
{
WebClient webClient = new WebClient();
string url = "http://www.bnr.ro/files/xml/years/nbrfxrates" + year + ".xml";
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(response.GetResponseStream());
XElement xe = XElement.Load(response.GetResponseStream());
var x = (from e in xe.Elements("Cube")
where e.Attribute("date").Value.Equals(data)
select e.Element("Rate") into p
where p.Element("currency").Value.Equals(data)
select p.Value);
Console.WriteLine(x.ToString());
}
My error is:
The root element is missing on XElement xe =
XElement.Load(response.GetResponseStream());

Method GetResponseStream returns a stream:
Gets the stream that is used to read the body of the response from the
server.
You cannot read this stream twice. When you load XmlDocument it reads data from network stream and closes it releasing the connection. When you try to load XElement from the closed stream, you get an error.
You should read response stream only once - e.g. into string or into MemoryStream:
string xml;
using(var reader = new StreamReader(r.GetResponseStream()))
xml = reader.ReadToEnd();
Note: It's not clear why you need XmlDocument if you are using linq to xml.

How to fetch value from string being returned from responsestream

I have the following code
response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream responseStream = response.GetResponseStream();
string responseStr = new StreamReader(responseStream).ReadToEnd();
}
And the responseStr has the value <string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">cb8fbc96-05c6-4b9f-a64d-91a2d357c398</string>
I would like to fetch this value alone cb8fbc96-05c6-4b9f-a64d-91a2d357c398.
Any help much appreciated...Thank you in advance.

Seems on your client-side, it send a header to specific xml format as result.
There are several ways to do. If you could control your client-side, just set:
Content-Type=text/plain
If you can't control your client-side, another fairly easy way is to use reg-expression to get rid of those XML tags.
String patternXMLTag = #"<[^>]*>";
Regex XMLpattern = new Regex(PatternXMLTag, RegexOption.Compiled);
string result = Regex.Replace(val, patternXMLTag, string.Empty, RegexOptions.IgnoreCase);
//result is what you want
return result;
Of course, Microsoft provide XmlDocument and XmlElementand class helps you grab data out from a beauty formatted document. You could use them to easily grab data you want.

Is it possible to navigate a site by clicking links and then downloading the correct piece?

I'll try to explain what exactly I mean. I'm working on a program and I'm trying to download a bunch of images automatically from this site.
Namely, I want to download the big square icons from the page you get when you click on a hero name there, for example on the Darius page the image in the top left with the name DariusSquare.png and save that into a folder.
Is this possible or am I asking too much from C#?
Thank you very much!

In general, everything is possible given enough time and money. In your case, you need very little of former and none of latter :)
What you need to do can be described in following high-level steps:
Get all <a> tags within the table with heroes.
Use WebClient class to navigate to URL these <a> tags point to (i.e. to value of href attributes) and download the HTML
You will need to find some wrapper element that is present on each page with hero and that contains his image. Then, you should be able to get to the image src attribute and download it. Alternatively, perhaps each image has an common ID you can use?
I don't think anyone will provide you with an exact code that will perform these steps for you. Instead, you need to do some research of your own.

Yes it's possible, do a C# Web request and use the C# HTML Agility Pack to find the image url.
The you can use another web request to download the image:
Example downloading image from url:
public static Image LoadImage(string url)
{
var backgroundUrl = url;
var request = WebRequest.Create(backgroundUrl);
var response = request.GetResponse();
var stream = response.GetResponseStream();
return Image.FromStream(stream);
}
Example using html agility pack and getting some other data:
var request = (HttpWebRequest)WebRequest.Create(profileurl);
request.Method = "GET";
using (var response = request.GetResponse())
{
using (var stream = response.GetResponseStream())
{
using (var reader = new StreamReader(stream, Encoding.UTF8))
{
result = reader.ReadToEnd();
}
var doc = new HtmlDocument();
doc.Load(new StringReader(result));
var root = doc.DocumentNode;
HtmlNode profileHeader = root.SelectSingleNode("//*[#id='profile-header']");
HtmlNode profileRight = root.SelectSingleNode("//*[#id='profile-right']");
string rankHtml = profileHeader.SelectSingleNode("//*[#id='best-team-1']").OuterHtml.Trim();
#region GetPlayerAvatar
var avatarMatch = Regex.Match(profileHeader.SelectSingleNode("/html/body/div/div[2]/div/div/div/div/div/span").OuterHtml, #"(portraits[^(h3)]+).*no-repeat;", RegexOptions.IgnoreCase);
if (avatarMatch.Success)
{
battleNetPlayerFromDB.PlayerAvatarCss = avatarMatch.Value;
}
#endregion
}
}

Issue with HTMLAgilityPack parsing HTML using C#

I'm just trying to learn about HTMLAgilityPack and XPath, I'm attempting to get a list of (HTML Links) companies from the NASDAQ website;
http://www.nasdaq.com/quotes/nasdaq-100-stocks.aspx
I currently have the following code;
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
// Create a request for the URL.
WebRequest request = WebRequest.Create("http://www.nasdaq.com/quotes/nasdaq-100-stocks.aspx");
// Get the response.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();
// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream);
// Read the content.
string responseFromServer = reader.ReadToEnd();
// Read into a HTML store read for HAP
htmlDoc.LoadHtml(responseFromServer);
HtmlNodeCollection tl = htmlDoc.DocumentNode.SelectNodes("//*[#id='indu_table']/tbody/tr[*]/td/b/a");
foreach (HtmlAgilityPack.HtmlNode node in tl)
{
Debug.Write(node.InnerText);
}
// Cleanup the streams and the response.
reader.Close();
dataStream.Close();
response.Close();
I've used an XPath addon for Chrome to get the XPath of;
//*table[#id='indu_table']/tbody/tr[*]/td/b/a
When running my project, I get an xpath unhandled exception about it being an invalid token.
I'm a little unsure what's wrong with it, i've tried to put a number in the tr[*] section above but i still get the same error.
I've been looking at this for the last hour, is it anything simple?
thanks

Since the data comes from javascript you have to parse the javascript and not the html, so the Agility Pack doesn't help that much, but it makes things a bit easier. The following is how it could be done using Agility Pack and Newtonsoft JSON.Net to parse the Javascript.
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.Load(new WebClient().OpenRead("http://www.nasdaq.com/quotes/nasdaq-100-stocks.aspx"));
List<string> listStocks = new List<string>();
HtmlNode scriptNode = htmlDoc.DocumentNode.SelectSingleNode("//script[contains(text(),'var table_body =')]");
if (scriptNode != null)
{
//Using Regex here to get just the array we're interested in...
string stockArray = Regex.Match(scriptNode.InnerText, "table_body = (?<Array>\\[.+?\\]);").Groups["Array"].Value;
JArray jArray = JArray.Parse(stockArray);
foreach (JToken token in jArray.Children())
{
listStocks.Add("http://www.nasdaq.com/symbol/" + token.First.Value<string>().ToLower());
}
}
To explain a bit more in detail, the data comes from one big javascript array on the page var table_body = [....
Each stock is one element in the array and is an array itself.
["ATVI", "Activision Blizzard, Inc", 11.75, 0.06, 0.51, 3058125, 0.06, "N", "N"]
So by parsing the array and taking the first element and appending the fix url we get the same result as the javascript.

Why won't you just use Descendants("a") method?
It's much simplier and is more object oriented. You'll just get a bunch of objects.
The you can just get the "href" attribute from those objects.
Sample code:
htmlDoc.DocumentNode.Descendants("a").Attributes["href"].Value
If you just need list of links from certain webpage, this method will do just fine.

If you look at the page source for that URL, there's not actually an element with id=indu_table. It appears to be generated dynamically (i.e. in javascript); the html that you get when loading directly from the server will not reflect anything that's changed by client script. This is probably why it's not working.

XMLDocument, problem reading Node

I am doing the following:
System.Net.WebRequest myRequest = System.Net.WebRequest.Create("http://www.atlantawithkid.com/feed/");
System.Net.WebResponse myResponse = myRequest.GetResponse();
System.IO.Stream rssStream = myResponse.GetResponseStream();
System.Xml.XmlDocument rssDoc = new System.Xml.XmlDocument();
rssDoc.Load(rssStream);
System.Xml.XmlNodeList rssItems = rssDoc.SelectNodes("rss/channel/item");
System.Xml.XmlNode rssDetail;
// FEED DESCRIPTION
string sRssDescription;
rssDetail = rssItems.Item(0).SelectSingleNode("description");
if (rssDetail != null)
sRssDescription = rssDetail.InnerText;
But, when I read the "description" node and view the InnerText, or the InnerXML, the string is different than in the original XML document.
The string return has and ellipses and the data si truncated. However, in the original XML document there is data that I can see.
Is there a way to select this node without the data being altered?
Thanks for the help.

I suspect you're looking at the string in the debugger, and that may be truncating the data. (Or you're writing it into something else which truncates text.)
I very much doubt that this is an XmlDocument problem.
I suggest you log the InnerText somewhere that you know you'll be able to get full data out, so you can tell for sure.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get xml data from external website - c#

Related

Parse Xml response from a get method in C# using Linq

How to fetch value from string being returned from responsestream

Is it possible to navigate a site by clicking links and then downloading the correct piece?

Issue with HTMLAgilityPack parsing HTML using C#

XMLDocument, problem reading Node

Categories

Resources