C# htmlagility, getting exception when i add header in following code

C# htmlagility, getting exception when i add header in following code - c#

I'm getting exception when i run this code
Exception "header must be modified using the appropriate property or method."
HtmlAgilityPack.HtmlWeb web = new HtmlWeb();
web.UserAgent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36";
web.PreRequest += (request) =>
{
request.Headers.Add("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
request.Headers.Add("Accept-Language", "de-DE");
return true;
};
HtmlAgilityPack.HtmlDocument doc = new
HtmlAgilityPack.HtmlDocument();
doc = web.Load("http://www.alfatah.pk/");

This works for me in https://dotnetfiddle.net/AQbs3v :
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.alfatah.pk/");
request.UserAgent = "Mozilla / 5.0(Windows NT 10.0; Win64; x64) AppleWebKit / 537.36(KHTML, like Gecko) Chrome / 70.0.3538.77 Safari / 537.36";
request.Accept= "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
//request.Headers.Add(HttpRequestHeader.AcceptLanguage, "de-DE");
using (var response = (HttpWebResponse)(request.GetResponse()))
{
HttpStatusCode code = response.StatusCode;
if (code == HttpStatusCode.OK)
{
using (StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.Load(sr);
Console.Write(htmlDoc.DocumentNode.InnerText);
}
}
}

Related

c# add CookieContainer to HttpClientHandler

I have the following function written in c#:
public static string FunctionName()
{
try
{
using (var httpClient = new HttpClient())
{
var uri = new Uri("https://www.website.com/api/1/auth/user");
httpClient.BaseAddress = uri;
httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36");
httpClient.DefaultRequestHeaders.Add("Cookie", "apiKey=dasdasd; auth=authcookie_dasd-c28673189043; id_chat.com=dasdasdad");
return httpClient.GetAsync(uri).Result.ToString();
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message.ToString());
}
return null;
}
where the cookie is added directly to the header. Now I was trying to split this up to add it to a CookieContainer. The source code looks like the following:
public static string FunctionName()
{
try
{
HttpClientHandler handler = new HttpClientHandler();
handler.CookieContainer = new CookieContainer();
Uri target = new Uri("https://www.website.com");
handler.CookieContainer.Add(new Cookie("apiKey", "dasdasd") { Domain = target.Host });
handler.CookieContainer.Add(new Cookie("auth", "authcookie_dasd-c28673189043") { Domain = target.Host });
handler.CookieContainer.Add(new Cookie("id_chat.com", "dasdasdad") { Domain = target.Host });
HttpClient http = new HttpClient(handler);
http.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36");
var response = http.GetAsync("https://www.website.com/api/1/auth/user");
Console.WriteLine(response.Result.ToString());
if (response.Result.StatusCode == HttpStatusCode.OK)
{
var json = JsonConvert.DeserializeObject<LoginResponse>(response.Result.Content.ReadAsStringAsync().Result);
return json.id;
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message.ToString());
}
return null;
}
The first function would return the status "OK".
The second function would return the status "Unauthorized".
What is causing this problem? Did I set up the CookieContainer wrong?

Can't download web page in .net

I did a batch that parse html page of gearbest.com to extract data of the items (example link link).
It worked until 2-3 week ago after that the site was updated.
So I can't dowload pages to parse and I don't undastand why.
Before the update I did request with the following code with HtmlAgilityPack.
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = null;
doc = web.Load(url); //now this the point where is throw the exception
I tried without the framework and I added some date to the request
HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html");
request.Credentials = CredentialCache.DefaultCredentials;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.ContentType = "text/html; charset=UTF-8";
request.CookieContainer = new CookieContainer();
request.Headers.Add("accept-language", "it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7");
request.Headers.Add("accept-encoding", "gzip, deflate, br");
request.Headers.Add("upgrade-insecure-requests", "1");
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
request.CookieContainer = new CookieContainer();
Response response = request.GetResponse(); //exception
the exception is:
IOException: Unable to read data from the transport connection
SocketException: The connection could not be established.
If I try to request the main page (https://it.gearbest.com) it works.
What's the problem in your opinion?

For some reason it doesn't like the provided user agent. If you omit setting UserAgent everything works fine
HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html");
request.Credentials = CredentialCache.DefaultCredentials;
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.ContentType = "text/html; charset=UTF-8";
Another solution would be setting request.Connection to a random string (but not keep-alive or close)
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.Connection = "random value";
It also works but I cannot explain why.

Might be worth a try...
HttpRequest.KeepAlive = false;
HttpRequest.ProtocolVersion = HttpVersion.Version10;
https://stackoverflow.com/a/16140621/1302730

C# - How to get cookies from website using xNet

I want to submit Emails in this website: https://www.bitdefender.com/site/Facebook/redFreeDownload?nfo%5Bemail%5D=agytnpbu%40gmx.com&nfo%5BreGoogle%5D=auto-detect&g-recaptcha-response=&nfo%5Bhash_page%5D=tsmd-2016-facebook&nfo%5Bsystem-2016%5D=active
There is a captcha but it doesn't work.
I use Fiddler to get all the information I need. However, I don't know how I can read the cookies so that I will be able to submit emails again and again...
How can I do it? Cookies:
__cfduid=d7132cd01781606b71aa23e43dca589bc1536917322; PHPSESSID=a4ujho0plamjc6nq4n31pqu627; _country=il; AWSALB=xAs9RL/e6nVR17J9qYZnooEQedeMW48ZEgx8no+xyhkQxhCjSsrcnc1l/LpjrfL8vBNXdA40agM6Zk3e4i84DGCDXd/7TqGaaYrb5zeyGHbxy8nBZqIGiKUKtKLV; bd112=3ZAxb4MwEIX%2FiyU6hRAgUIUqilRF6dq9ripjH2AFc5Z9lEZV%2Fnup24GhS9Zup3ffu3d6L59sdD2rWEdkK57wZJqmda1JQQODAreWaHjiNQFPTkJCjXjmiQN1cgBHnIYehToMDUbFIxih%2B6g47kV7ocHWY7TdtObj%2B8SdMPbhh3LwhNj2EMCRMFZAICkQbexACkuyE%2FPkLQ4e9gtvJ3z3ZkUbzOSNirNNWsbN72ML0l88gQnrECRJvwNbMdIGPM0cq9IiL3fpfZ5l19UtNTw7NEh6%2Fm1WrAOhlsrfGfnutox%2FWvW2vL5%2BAQ%3D%3D
My code:
using (var request = new HttpRequest())
{
request.Proxy = HttpProxyClient.Parse("127.0.0.1:8888");
request.IgnoreProtocolErrors = true;
request.KeepAlive = true;
request.ConnectTimeout = 10000;
request.AllowAutoRedirect = false;
request.AddParam("nfo[email]", "kirtchukdaniel#gmail.com");
request.AddParam("nfo[reGoogle]", "auto-detect");
request.AddParam("g-recaptcha-response", "");
request.AddParam("nfo[hash_page]", "tsmd-2016-facebook");
request.AddParam("nfo[system-2016]", "active");
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134";
request.AddHeader("Host ", "www.bitdefender.com");
request.Referer = "https://www.bitdefender.com/site/Facebook/redFreeDownload?nfo%5Bemail%5D=agytnpbu%40gmx.com&nfo%5BreGoogle%5D=auto-detect&g-recaptcha-response=&nfo%5Bhash_page%5D=tsmd-2016-facebook&nfo%5Bsystem-2016%5D=active";
var attempt = request.Get("https://www.bitdefender.com/site/Promotions/spreadPromotions/").ToString();
Console.WriteLine(attempt);
}
Console.ReadKey();

answer is simple.
request.Cookies = new CookieDictionary();
also remove this(or it can throw an exception).
request.AddHeader("Host ", "www.bitdefender.com");
// =>
// request.AddHeader("Host ", "www.bitdefender.com");

Not able to download file using webclient, but I am able to download it manually from the website

private void download_nse()
{
string source_location = "https://www.nseindia.com/content/historical/DERIVATIVES/2015/DEC/fo23DEC2015bhav.csv.zip";
Uri uu = new Uri(source_location);
using (WebClient fileReader = new WebClient())
{
try
{
var ua = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36";
fileReader.Headers.Add(HttpRequestHeader.UserAgent, ua);
fileReader.Headers["Accept"] = "/";
fileReader.DownloadFile(uu, #"d:\nse\notworking.zip");
fileReader.Dispose();
}
catch
{
}
}
source_location = "https://www.nseindia.com/content/historical/DERIVATIVES/2016/JAN/fo05JAN2016bhav.csv.zip";
using (WebClient fileReader = new WebClient())
{
try
{
var ua = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36";
fileReader.Headers.Add(HttpRequestHeader.UserAgent, ua);
fileReader.Headers["Accept"] = "/";
fileReader.DownloadFile(source_location, #"d:\nse\working.zip");
}
catch
{
}
}
}

I've checked the url you gave and see that when you select the "Date(DD-MM-YYYY)" and "Select Report" drop-down options and click on "Get Data", there is a GET request send to https://www.nseindia.com/ArchieveSearch with three parameter like this: "?h_filetype=fobhav&date=02-04-2018&section=FO". And this GET request returns:
<p class="archive_title">F&O - Bhavcopy for 02-04-2018 </p><br>
<br><table cellpadding=5>
<tr>
<td class=t0><a href=/content/historical/DERIVATIVES/2018/APR/fo02APR2018bhav.csv.zip target=new>fo02APR2018bhav.csv.zip</a></td></tr>
<tr>
</table>
</td></tr>
</table>
which contains the link to download file. so when you send the POST request without the GET it fails.
You can send a GET request first and then send POST with the returned link like this:
string source_location2 = "https://www.nseindia.com/ArchieveSearch";
Uri uu2 = new Uri(source_location2);
using (WebClient fileReader = new WebClient())
{
try
{
var ua = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36";
fileReader.Headers.Add(HttpRequestHeader.UserAgent, ua);
fileReader.Headers["Accept"] = "/";
fileReader.QueryString.Add("h_filetype", "fobhav");
fileReader.QueryString.Add("date", "02-04-2018");
fileReader.QueryString.Add("section", "FO");
var response = fileReader.DownloadString(uu2);
//using Html Agility Pack to parse the response and get the download link. you need to add this package through nuget package manager to project for this code to work.
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(response);
var fileLink = htmlDoc.DocumentNode.SelectSingleNode("//table//tr//td//a").Attributes["href"].Value;
//now sending the actual file request...
var fileReader2 = new WebClient();
fileReader2.Headers.Add(HttpRequestHeader.UserAgent, ua);
fileReader2.Headers["Accept"] = "/";
fileReader2.DownloadFile(new Uri("https://www.nseindia.com" + fileLink), #"d:\notworking.zip");
fileReader2.Dispose();
}
catch(Exception e)
{
throw e;
}
}

HttpRequest Header Response C#

I have a problem using the response of HttpRequest() i get the response but just the html not the headers and the key that i am searching is on the header so this is my code
HttpRequest rq = new HttpRequest();
rq.Cookies = new CookieDictionary();
rq.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36";
rq.AllowAutoRedirect = true;
rq.IgnoreProtocolErrors = true;
rq.ConnectTimeout = TimeOut;
rq.KeepAlive = true;
var str = rq.Get("url").ToString();
if(str.Contains("404")){
}
i hope you can help me

I Found the answer thanks for your help
var req = rq.Get("url");
if(req.StatusCode.ToString().Contains("NotFound") ){
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# htmlagility, getting exception when i add header in following code - c#

Related

c# add CookieContainer to HttpClientHandler

Can't download web page in .net

C# - How to get cookies from website using xNet

Not able to download file using webclient, but I am able to download it manually from the website

HttpRequest Header Response C#

Categories

Resources