How to use a proxy with in HtmlAgilityPack - c#

I need to use a proxy with HtmlAgilityPack.
I give a link to my app RefURL. After that I want the app get url from a proxy address. For instance "101.109.44.157:8080"
I searched and found out this:
WebClient wc = new WebClient();
wc.Proxy = new WebProxy(host,port);
var page = wc.DownloadString(url);
and used it like this.
RefURL = new Uri(refLink.Text);
WebClient wc = new WebClient();
wc.Proxy = new WebProxy("101.109.44.157:8080");
var page = wc.DownloadString(RefURL);
RefURL.ToString();
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(RefURL.ToString());
but it does not work!

The proxy IP is not responding but also you're not passing web proxy in this code line:
HtmlAgilityPack.HtmlDocument doc = web.Load(RefURL.ToString());
Should be:
HtmlAgilityPack.HtmlDocument doc = web.Load(RefURL.ToString(),"GET", webProxy);
First step is finding "fresh proxy IP" list, for example:
https://geonode.com/free-proxy-list/
https://free-proxy-list.net/uk-proxy.html
https://hidemy.name/en/proxy-list/
http://free-proxy.cz
http://nntime.com
Most of these addresses would work for few hours. Check out how to set proxy IP in a browser. If the proxy is anonymous, this page should be unable to detect your location and IP.
Once you have a proxy IP and port that works, you can create webProxy object or simply pass IP and port.
string RefURL = "https://www.whatismyip.com/";
string myProxyIP = "119.81.197.124"; //check this is still available
int myPort = 3128;
string userId = string.Empty; //leave it blank
string password = string.Empty;
try
{
HtmlWeb web = new HtmlWeb();
var doc = web.Load(RefURL.ToString(), myProxyIP, myPort, userId, password);
Console.WriteLine(doc.DocumentNode.InnerHtml);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}

Related

C# HtmlAgilityPack timeout before download page

I want parse site https://russiarunning.com/events?d=run on C# with htmlagilitypack
I'm try this make
string url = "https://russiarunning.com/events?d=run";
var web = new HtmlWeb();
var doc = web.Load(url);
But I got a problem - content on site loading with timeout ~1000ms
therefore, when using the web.Load (url) I download the page without content.
How make timeout before download page with htmlagilitypack ?
Try this...
Create one class as below :
public class WebClientHelper : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return request;
}
}
and use as below:
var data = new Helpers.WebClientHelper().DownloadString(Url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(data);
You can simply do this:
string url = "https://russiarunning.com/events?d=run";
var web = new HtmlWeb();
web.PreRequest = delegate(HttpWebRequest webReq)
{
webReq.Timeout = 4000; // number of milliseconds
return true;
};
var doc = web.Load(url);
More on Timeout property: https://learn.microsoft.com/en-us/dotnet/api/system.net.httpwebrequest.timeout?view=netframework-4.7.2

HtmlAgilityPack don't get xpath in c#

before, I use this code, it can get xpath of website. But, today I debug code, I see, it don't get data html from website: webtruyen.com. I try to check website.com/robots.txt. but it don't suspect. And I try to add proxy to get data, but return data null. I don't know how to get xpath from website webtruyen.com. Who help me? I want to know how to read data from website http://webtruyen.com.
My code:
string url = "http://webtruyen.com";
var web = new HtmlWeb();
var doc = web.Load(url);
String temps = "";
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a"))
{
temps = node.InnerHtml;
}
I debug, return:
InnerHtml 'doc.DocumentNode.InnerHtml' threw an exception of type 'System.NullReferenceException' string {System.NullReferenceException}
My code use proxy:
string url = "http://webtruyen.com";
var web = new HtmlWeb();
webGet.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)";
var doc = web.Load(url);
String temps = "";
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a"))
{
temps = node.InnerHtml;
}
I have the same error using HtmlWeb.Load(), but I can easily solve your issue using HttpWebRequest (TLDR: See #3 for the working code).
Step 1) Using the following code:
HttpWebRequest hwr = (HttpWebRequest)WebRequest.Create("http://webtruyen.com");
using (Stream s = hwr.GetResponse().GetResponseStream())
{ }
You see that you actually get a 403 Forbidden error (WebException).
Step 2)
HttpWebRequest hwr = (HttpWebRequest)WebRequest.Create("http://webtruyen.com");
HtmlDocument doc = new HtmlDocument();
try
{
using (Stream s = hwr.GetResponse().GetResponseStream())
{ }
}
catch (WebException wx)
{
doc.LoadHtml(new StreamReader(wx.Response.GetResponseStream()).ReadToEnd());
}
on doc.DocumentNode.OuterHtml, you see the HTML of the forbidden error with the JavaScript that sets the cookie on your browser and refreshes it.
3) So in order to load the page outside of a manual browser, you have to manually set that cookie and re-access it. Meaning, with:
string cookie = string.Empty;
HttpWebRequest hwr = (HttpWebRequest)WebRequest.Create("http://webtruyen.com");
try
{
using (Stream s = hwr.GetResponse().GetResponseStream())
{ }
}
catch (WebException wx)
{
cookie = Regex.Match(new StreamReader(wx.Response.GetResponseStream()).ReadToEnd(), "document.cookie = '(.*?)';").Groups[1].Value;
}
hwr = (HttpWebRequest)WebRequest.Create("http://webtruyen.com");
hwr.Headers.Add("Cookie", cookie);
HtmlDocument doc = new HtmlDocument();
using (Stream s = hwr.GetResponse().GetResponseStream())
using (StreamReader sr = new StreamReader(s))
{
doc.LoadHtml(sr.ReadToEnd());
}
You get the page :)
Moral of the story, if your browser can do it, so can you.

get html source through proxy

Hi how do I get the source of an html page through a proxy. When I use the code below I get an error saying "Proxy Authentication Required." and I have to go through a proxy.
Dim client As New WebClient()
Dim htmlCode As String = client.DownloadString("http://www.stackoverflow.com")
Then use a proxy that does not need authentication
see here for more info
http://msdn.microsoft.com/en-us/library/system.net.webclient.proxy.aspx
string source = GetPageSource("http://www.stackoverflow.com");
private string GetPageSource(string url)
{
string htmlSource = string.Empty;
try
{
System.Net.WebProxy myProxy = new System.Net.WebProxy("Proxy IP", 8080);
using (System.Net.WebClient client = new System.Net.WebClient())
{
client.Proxy = myProxy;
client.Proxy.Credentials = new System.Net.NetworkCredential("username", "password");
htmlSource = client.DownloadString(url);
}
}
catch (WebException ex)
{
// log any exceptions
}
return htmlSource;
}

Get location of client machine using ip address

I am trying to get the location of client machine using ip address. Client can access the internet only if
he/she provide the proxy authenication.
Let us say client need to access the 'www.google.com' on the browser then immediately Authenication Required
prompt window open and then client enter his/her username and password. But it is possible the few users does
not required the provide the authenication in order to access internet.
This segment of code does not helped me...
string url = "http://freegeoip.net/xml/";
WebClient wc = new WebClient();
WebProxy proxyObj = new WebProxy("http://freegeoip.net/xml/");
proxyObj.Credentials = CredentialCache.DefaultCredentials;
Uri uri = new Uri(url);
MemoryStream ms = new MemoryStream(wc.DownloadData(uri));
XmlTextReader rdr = new XmlTextReader(url);
XmlDocument doc = new XmlDocument();
ms.Position = 0;
doc.Load(ms);
ms.Dispose();
In the above code if i add network credential instance with username, password and domain then it's work perfectly
Instead of providing the default net credential in code itself, I need to get the username and password from the users(client
machine).
My question is how to prompt the Authentication Required Window and get the username and password to load the download from url
I would be glad if someone throw light on this issue...
Edit: Somehow basic authentication window prompt and now i can get the username and password which can use for credential
try
{
var reg = HttpContext.Current.Request;
if (!String.IsNullOrEmpty(reg.Headers["Authorization"]))
{
var cred = System.Text.ASCIIEncoding.ASCII.GetString(Convert.FromBase64String(Request.Headers["Authorization"].Substring(6))).Split(':');
var user = new { Name = cred[0], Pass = cred[1] };
string url = "http://freegeoip.net/xml/";
WebClient wc = new WebClient();
WebProxy wProxy = new WebProxy();
ICredentials crd;
crd = new NetworkCredential("'" + cred[0] + "'", "'" + cred[1] + "'");
wProxy = new WebProxy("myproxy", true, null, crd);
wc.Proxy = wProxy;
Uri uri = new Uri(url);
string content = wc.DownloadString(uri);
}
else
{
try
{
//var reg = HttpContext.Current.Request;
if (String.IsNullOrEmpty(reg.Headers["Authorization"]))
{
var res = HttpContext.Current.Response;
res.StatusCode = 401;
res.AddHeader("WWW-Authenticate", "Basic realm = \"freegeoip\"");
//res.End();
}
}
catch (Exception ex)
{
}
}
}
catch(Exception ex)
{
}
But Still It throwing the "Unable to connect to the remote server"

How to read XML from the internet using a Web Proxy?

This is a follow-up to this question: How to load XML into a DataTable?
I want to read an XML file on the internet into a DataTable. The XML file is here: http://rates.fxcm.com/RatesXML
If I do:
public DataTable GetCurrentFxPrices(string url)
{
WebProxy wp = new WebProxy("http://mywebproxy:8080", true);
wp.Credentials = CredentialCache.DefaultCredentials;
WebClient wc = new WebClient();
wc.Proxy = wp;
MemoryStream ms = new MemoryStream(wc.DownloadData(url));
DataSet ds = new DataSet("fxPrices");
ds.ReadXml(ms);
DataTable dt = ds.Tables["Rate"];
return dt;
}
It works fine. I'm struggling with how to use the default proxy set in Internet Explorer. I don't want to hard-code the proxy. I also want the code to work if no proxy is specified in Internet Explorer.
You can use Console.WriteLine(System.Net.WebProxy.GetDefaultProxy().Address.AbsoluteUri); ...
Add the following setting to your app.config/web.config to use the system default proxy automatically:
<system.net>
<defaultProxy useDefaultCredentials="true"/>
</system.net>
#region Function to get x-rate via proxy
public string fncProxyGetRate(string countryCode)// use 'GBP' for British Pounds
{
string rtnTxt = "";
try
{
string url = "http://rss.timegenie.com/forex.xml";
string proxyUrl = "http://xxx.xxx.x.x:8080/";
string myXratePath = "/forex/data/code[text()='" + countryCode + "']";
WebProxy wp = new WebProxy(proxyUrl, true);
wp.Credentials = CredentialCache.DefaultCredentials;
WebClient wc = new WebClient();
wc.Proxy = wp;
MemoryStream ms = new MemoryStream(wc.DownloadData(url));
XmlTextReader rdr = new XmlTextReader(ms);
XmlDocument doc = new XmlDocument();
doc.Load(rdr);
rtnTxt = doc.SelectSingleNode(myXratePath).ParentNode.SelectSingleNode("rate").InnerXml;
}
catch (Exception ex)
{
rtnTxt = ex.Message;
}
return rtnTxt;
}
#endregion

Categories