C# downloading page returns old page

C# downloading page returns old page - c#

I have problem, i wrote method to get current song on Czech radio. They do not have API so i had to get song from html via html agility.dll
Problem is even though song title changes on page my method downloads old page, usually i have to wait like 20 seconds and have my app closed, then it works.
I thought some cache problem, but i could not fix it.
tried: DownloadString method did not refresh either.
public static string[] GetEV2Songs()
{
List<string> songy = new List<string>();
string urlAddress = "http://www.evropa2.cz/";
string data = "";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
readStream = new StreamReader(receiveStream);
else
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
data = readStream.ReadToEnd();
response.Close();
readStream.Close();
}
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(data);
string temp = "";
foreach (var node in doc.DocumentNode.SelectNodes("//body//h2"))
{
if (node.InnerText.Contains("&ndash"))
{
temp = node.InnerText.Replace("–", "-");
songy.Add(temp);
}
}
return songy.ToArray();
}

Sounds like being a caching problem. Try to replace the 4th line with something like that:
string urlAddress = "http://www.evropa2.cz/?_=" + System.Guid.NewGuid().ToString();

Related

Webpage access with other methods?

I have this new class "WebPage". I am calling it from a button event in Form1.
The problem is that I Re-Call this entire process of website identification every time I go to a new page on that website. I think this BIG method is needed only one time, when I first access the website. My intuition say that I should use something else than this big code for each other pages on the same website. I am wrong and I should stick with what I have and working already? Or I am right and there are other methods (After) this one ?
Thank you !
(C# code)
public class WebPage
{
public string GetText(string url)
{
//Special webpage Reading (extract info from page)
HttpWebRequest request;
HttpWebResponse response = null;
Stream stream = null;
request = (HttpWebRequest)WebRequest.Create(url);
request.UserAgent = "Foo";
request.Accept = "*/*";
response = (HttpWebResponse)request.GetResponse();
stream = response.GetResponseStream();
StreamReader sr = new StreamReader(stream, System.Text.Encoding.Default);
string text = sr.ReadToEnd();
sr.Close();
if (stream != null) stream.Close();
if (response != null) response.Close();
return text;
}
}

How can I store a part of a very large HTML stream?

I have to get the HTML code of a web and after that to find this class:
<span class='uccResultAmount'>0,896903</span>
I have tried with Regular-Expressions.
And also with Streams, I mean, storing the whole HTML code in a string. However, the code is very large for a string. So that makes it impossible, because the amount 0,896903 I am searching does not exist in the string.
Is there any way to only read a little block of the Stream?
A part of the method:
public static string getValue()
{
string data = "not found";
string urlAddress = "http://www.xe.com/es/currencyconverter/convert/?Amount=1&From=USD&To=EUR";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
data = readStream.ReadToEnd(); // the string in which I should search for the amount
response.Close();
readStream.Close();
}
If you find an easier way to fix my problem let me know it.

I would use HtmlAgilityPack and Xpath
var web = new HtmlAgilityPack.HtmlWeb();
var doc = web.Load("http://www.xe.com/es/currencyconverter/convert/?Amount=1&From=USD&To=EUR");
var value = doc.DocumentNode.SelectSingleNode("//span[#class='uccResultAmount']")
.InnerText;
A Linq version is also possible
var value = doc.DocumentNode.Descendants("span")
.Where(s => s.Attributes["class"] != null && s.Attributes["class"].Value == "uccResultAmount")
.First()
.InnerText;
Don't use this. Just to show
But the problem is that this html code does not fit in a single string
is not correct
string html = new WebClient().DownloadString("http://www.xe.com/es/currencyconverter/convert/?Amount=1&From=USD&To=EUR");
var val = Regex.Match(html, #"<span[^>]+?class='uccResultAmount'>(.+?)</span>")
.Groups[1]
.Value;

How does cite bite achieve it's cache?

Can anyone shed light on how citebite achieves it's cache and in particular how it is able to display the cache having the same layout as the original page?
I am looking to achieving something very similar: I pulled the html from the source using
public static string sourceCache (string URL)
{
string sourceURL = URL;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(sourceURL);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
string data = readStream.ReadToEnd();
response.Close();
readStream.Close();
return data;
}
return "couldn't retrieve cache";
}
}
which I then send to my database storing as nvarchar(max). When loading the page to display the cache, I pull the field and set it as the innerhtml of a div property.
However, whereas on citebite their cache retains the styling and layout of the sourcepage, mine does not.
Where am I going wrong?
I have an asp.net 4.5 c# web forms website

Create one for this page, look at the source. The secret is
<base href="http://stackoverflow.com/questions/28432505/how-does-cite-bite-acheive-its-cache" />
The HTML Base Element () specifies the base URL to use for all
relative URLs contained within a document.There is maximum one
element in a document.

As per #Alex K above the base element appears to be the issue.
I have amended the code to check if the existing html has "base href" in it, and if not to insert the base element with the href set to the source url
public static string sourceCache (string URL)
{
string sourceURL = URL;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(sourceURL);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
string data = readStream.ReadToEnd();
response.Close();
readStream.Close();
if (data.Contains("base href"))
{
return data;
}
else
{
//we need to insert the base href with the source url
data = basecache(data, URL);
return data;
}
}
return "couldn't retrieve cache";
}
public static string basecache (string htmlsource, string urlsource)
{
//make sure there is a head tag
if (htmlsource.IndexOf("<head>") != -1)
{
int headtag = htmlsource.IndexOf("<head>");
string newhtml = htmlsource.Insert(headtag + "<head>".Length, "<base href='" + urlsource + "'/>");
return newhtml;
}
else if(htmlsource.IndexOf("<head>") != -1)
{
int headtag = htmlsource.IndexOf("<head>");
string newhtml = htmlsource.Insert(headtag + "<head>".Length, "<base href='" + urlsource + "'/>");
return newhtml;
}
else
{
return htmlsource;
}
}
So far i've only tested it on a few sites/domains but it appears to work, thank you so much Alex for your help.

How can i get the html of the page? [duplicate]

This question already has answers here:
How to get Url Hash (#) from server side
(6 answers)
Closed 9 years ago.
Here is the code I use to perform a web request. I'm getting all of the HTML except for the comments section in the URL.
HttpWebRequest req = (HttpWebRequest) HttpWebRequest.Create(
"http://u-handbag.typepad.com/uhandblog/2013/11/choosing-bag-fabrics.html#comment-6a00d8341c574653ef019b022fc96f970d"
);
StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream());
htl = reader.ReadToEnd();
Can anyone explain why?

Use this chunk of code. Variable result should have the html code.
System.Net.WebClient webClient = new System.Net.WebClient();
string result = webClient.DownloadString(URL);

Getting HTML code from a website page. You can use code like this.
string urlAddress = "http://u-handbag.typepad.com/uhandblog/2013/11/choosing-bag-fabrics.html#comment-6a00d8341c574653ef019b022fc96f970d";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
readStream = new StreamReader(receiveStream);
else
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
string data = readStream.ReadToEnd();
response.Close();
readStream.Close();
}
or better to use WebClient
using System.Net;
using (WebClient client = new WebClient())
{
string htmlCode = client.DownloadString("http://u-handbag.typepad.com/uhandblog/2013/11/choosing-bag-fabrics.html#comment-6a00d8341c574653ef019b022fc96f970d");
}

How search for HTML elements in StreamReader or String

I've been searching a simple web crawler, and i need search an elements inside my StreamBuilder or string. Example, i need get all content inside an div with id "bodyDiv". Which tool helper me with this?
private static string GetPage(string url)
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.UserAgent = "Simple crawler";
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string htmlText = reader.ReadToEnd();
return htmlText;
}

I would use HtmlAgilityPack
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlText);
var div = doc.DocumentNode.SelectSingleNode("//div[#id='bodyDiv']");
if(div!=null)
{
var yourtext = div.InnerText;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# downloading page returns old page - c#

Sounds like being a caching problem. Try to replace the 4th line with something like that: string urlAddress = "http://www.evropa2.cz/?_=" + System.Guid.NewGuid().ToString();

Related

Webpage access with other methods?

How can I store a part of a very large HTML stream?

How does cite bite achieve it's cache?

How can i get the html of the page? [duplicate]

How search for HTML elements in StreamReader or String

Categories

Resources