get method returns root url content

get method returns root url content - c#

i try to ge the content of this url: https://www.eganba.com/index.php?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1
but as a result of the following code the response contains the content of this url, the home page: https://www.eganba.com
in addition, when i try to get the first url content via Postman application the response is correct.
do you have any idea?
WebRequest request = WebRequest.Create("https://www.eganba.com/index.php?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1");
request.Method = "GET";
request.Headers["X-Requested-With"] = "XMLHttpRequest";
WebResponse response = request.GetResponse();

Use WebClient method which inside System.Net. I think this code gives you what you need. It return the page's html
using (WebClient client = new WebClient())
{
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
client.Headers.Add("accept", "text/html");
var htmlCode = client.DownloadString("https://www.eganba.com/?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1");
var result = htmlCode.Contains("Stokta var") ? true : false;
}
Hope it helps to you.

Related

cannot get page source using c# httpwebrequest

I tried to get the source of a particular site page using the code below but it failed.
I was able to get the page source in 1~2 seconds using a webbrowser or webdriver, but httpwebrequest failed.
I tried putting the actual webbrowser cookie into httpwebrequest, but it failed, too.
(Exception - The operation has timed out)
I wonder why it failed and want to learn through failure.
Thank you in advance!!.
string Html = String.Empty;
CookieContainer cc = new CookieContainer();
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("https://www.coupang.com/");
req.Method = "GET";
req.Host = "www.coupang.com";
req.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36";
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3";
req.Headers.Add("Accept-Language", "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7");
req.CookieContainer = cc;
using (HttpWebResponse res = (HttpWebResponse)req.GetResponse())
using (StreamReader str = new StreamReader(res.GetResponseStream(), Encoding.UTF8))
{
Html = str.ReadToEnd();
}

Removing req.Host from your code should do the trick.
According to the documentation:
If the Host property is not set, then the Host header value to use in an HTTP request is based on the request URI.
You already set the URI in (HttpWebRequest)WebRequest.Create("https://www.coupang.com/") so I don't think doing it again is necessary.
Result
Please let me know if it helps.

How to read HTML source of a page that requires NTML authentication

I need to get the HTML source of the web page.
This web page is a part of the web site that requires NTLM authentication.
This authentication is silent because Internet Explorer can use Windows log-in credentials.
Is it possible to reuse this silent authentication (i.e. reuse Windows log-in credentials), without making the user enter his/her credentials manually?
The options I have tried are below.
string url = #"http://myWebSite";
//works fine
System.Diagnostics.Process.Start("IExplore.exe", url);
InternetExplorer ie = null;
ie = new SHDocVw.InternetExplorer();
ie.Navigate(url);
//Works up to here, but I do not know how to read the HTML source with SHDocVw
NHtmlUnit.WebClient webClient = new NHtmlUnit.WebClient(BrowserVersion.INTERNET_EXPLORER_8);
HtmlPage htmlPage = webClient.GetHtmlPage(url);
string ghjg = htmlPage.WebResponse.ContentAsString; // Error 401
System.Net.WebClient client = new System.Net.WebClient();
client.Credentials = CredentialCache.DefaultNetworkCredentials;
client.Proxy.Credentials = CredentialCache.DefaultCredentials;
// DefaultNetworkCredentials and DefaultCredentials are empty
client.Headers.Add("user-agent", "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)");
string reply = client.DownloadString(url); // Error 401
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
IWebProxy proxy = request.Proxy;
// Print the Proxy Url to the console.
if (proxy != null)
{
// Use the default credentials of the logged on user.
proxy.Credentials = CredentialCache.DefaultNetworkCredentials;
// DefaultNetworkCredentials are empty
}
request.UserAgent = "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)";
request.Accept = "*/*";
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
Stream stream = response.GetResponseStream(); // Error 401

"The underlying connection was closed: The connection was closed unexpectedly"

When i try to get html page i get this error:
The underlying connection was closed: The connection was closed unexpectedly
I think the site I'm getting, is using some protection based on ip.
WebClient single_page_client = new WebClient();
single_page_client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705;)");
string cat_page_single = single_page_client.DownloadString(the_url);
How can i do it?
What about use proxy with Webclient?
EDIT
If i use this code, it works. Why?
HttpWebRequest webrequest = (HttpWebRequest)WebRequest.Create(current_url);
webrequest.KeepAlive = true;
webrequest.Method = "GET";
webrequest.ContentType = "text/html";
webrequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
//webrequest.Connection = "keep-alive";
webrequest.Host = "cat.sabresonicweb.com";
webrequest.Headers.Add("Accept-Language", "en-US,en;q=0.5");
webrequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; rv:18.0) Gecko/20100101 Firefox/18.0";
HttpWebResponse webresponse = (HttpWebResponse)webrequest.GetResponse();
Console.Write(webresponse.StatusCode);
Stream receiveStream = webresponse.GetResponseStream();
Encoding enc = System.Text.Encoding.GetEncoding(1252);//1252
StreamReader loResponseStream = new StreamReader(receiveStream, enc);
string current_page = loResponseStream.ReadToEnd();
loResponseStream.Close();
webresponse.Close();

The first request does not utilize a header that indicates the length of the result. It closes the connection when it finishes.
The second request utilizes the length header, reads the designated number of bytes, then closes the connection gracefully. (under the client side control instead of server driven disconnection)
-or-
The url you sent caused an error on the server. Is there an error in the server log or event viewer?

C# request a webpage failed but successd using web browser (Well checked the header and cookies)

My friend is using C# to write a simple program for requesting a webpage.
However he encounter a problem when try to request a specified webpage.
He have already tried to set all the header and cookie inside the request, but it still got the timeout exception.
The example webpage is https://my.ooma.com
Here is the code:
string url = "https://my.ooma.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Timeout = 30000;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.52 Safari/536.5";
request.Method = "GET";
request.CookieContainer = new CookieContainer();
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.Headers.Add("Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3");
request.Headers.Add("Accept-Encoding:gzip,deflate,sdch");
request.Headers.Add("Accept-Language:en-US,en;q=0.8");
request.KeepAlive = true;
WebResponse myResponse = request.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream());
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();
All the headers is as same as when using Chrome to browse the webpage.
And he didn't see any cookies set by using the Chrome developer tool.
Do anyone can success request the page using C#?
Thanks a lot.

Sorry for being late.
The following code snippet should work just fine. I also tried with tour old URL that had "getodds.xgi" in it and it also worked fine.
The server uses a secure sockets layer (SSL) protocol for connections that use the Secure Hypertext Transfer Protocol (HTTPS) scheme only.
You don't need to set any UserAgent or Header if they were just intended to get response.
ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3;
WebRequest request = WebRequest.Create("http://my.ooma.com/");
string htmlResponse = string.Empty;
using (WebResponse response = request.GetResponse())
{
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
htmlResponse = reader.ReadToEnd();
reader.Close();
}
response.Close();
}

Cant access site using webclient method..?

I am making a desktop yellowpage application. I can access all countries yellowpage site but not australian site. I dont know why?
Here is the code
class Program
{
static void Main(string[] args)
{
WebClient wb = new WebClient();
wb.Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)");
string html = wb.DownloadString("http://www.yellowpages.com.au");
Console.WriteLine(html);
}
}
For all other site I get html of the website for australian site I get null. i even tried httpwebrequest also.
Here is the yellowpage australian site: http://www.yellowpages.com.au
Thanks in advance

It looks like that website will only send over gzip'ed data. Try switching to HttpWebRequest and using auto decompression:
var request = (HttpWebRequest)WebRequest.Create("http://www.yellowpages.com.au");
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705;)";
request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

In addition to #bkaid's correct (and upvoted) answer, you can use your own class inherited from WebClient to uncompress/handle gzip compressed html:
public class GZipWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = (HttpWebRequest)base.GetWebRequest(address);
request.AutomaticDecompression = DecompressionMethods.GZip |
DecompressionMethods.Deflate;
return request;
}
}
Having done this, the following works just fine:
WebClient wb = new GZipWebClient();
string html = wb.DownloadString("http://www.yellowpages.com.au");

When I view the transfer from that website in Wireshark, it says it's a malformed HTTP packet. It says it uses chunked transfer, then says the following chunk has 0 bytes and then sends the code of the website. That's why WebClient returns an empty string (not null). And I think it's correct behavior.
It seems browsers ignore this error and so they can display the page properly.
EDIT:
As bkaid pointed out, the server seems to handle send correct gziped response. The following code works for me:
WebClient wb = new WebClient();
wb.Headers.Add("Accept-Encoding", "gzip");
string html;
using (var webStream = wb.OpenRead("http://www.yellowpages.com.au"))
using (var gzipStream = new GZipStream(webStream, CompressionMode.Decompress))
using (var streamReader = new StreamReader(gzipStream))
html = streamReader.ReadToEnd();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

get method returns root url content - c#

Related

cannot get page source using c# httpwebrequest

How to read HTML source of a page that requires NTML authentication

"The underlying connection was closed: The connection was closed unexpectedly"

C# request a webpage failed but successd using web browser (Well checked the header and cookies)

Cant access site using webclient method..?

Categories

Resources