cannot get page source using c# httpwebrequest

cannot get page source using c# httpwebrequest - c#

I tried to get the source of a particular site page using the code below but it failed.
I was able to get the page source in 1~2 seconds using a webbrowser or webdriver, but httpwebrequest failed.
I tried putting the actual webbrowser cookie into httpwebrequest, but it failed, too.
(Exception - The operation has timed out)
I wonder why it failed and want to learn through failure.
Thank you in advance!!.
string Html = String.Empty;
CookieContainer cc = new CookieContainer();
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("https://www.coupang.com/");
req.Method = "GET";
req.Host = "www.coupang.com";
req.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36";
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3";
req.Headers.Add("Accept-Language", "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7");
req.CookieContainer = cc;
using (HttpWebResponse res = (HttpWebResponse)req.GetResponse())
using (StreamReader str = new StreamReader(res.GetResponseStream(), Encoding.UTF8))
{
Html = str.ReadToEnd();
}

Removing req.Host from your code should do the trick.
According to the documentation:
If the Host property is not set, then the Host header value to use in an HTTP request is based on the request URI.
You already set the URI in (HttpWebRequest)WebRequest.Create("https://www.coupang.com/") so I don't think doing it again is necessary.
Result
Please let me know if it helps.

Related

HTTPWebRequest fails when setting CookieContainer

I used the Fiddler extension RequestToCode to replay a POST from logging into Yahoo.
When I run the code, I can see in Fiddler that the login was successful and there are 10 cookies in the response.
In my code though, the response.Cookies had a count of 0.
So I updated my HTTPWebRequest and set:
request.CookieContainer = new CookieContainer();
When I run the code again and look at it in Fiddler I see the login failed because the response navigates to a failed login url.
My ultimate goal is to get the cookies from the login attempt to use in a later Get request to Yahoo.
Why is setting the cookie container causing a failure?

Maybe because you initializing new CookieContainer on every request.
Declare public variable CookieContainer cookies = new CookieContainer();
Now your new requests will use the same CookieContainer, example:
var request = (HttpWebRequest)WebRequest.Create("https://www.yahoo.com/");
request.CookieContainer = cookies;
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
request.Headers.Add("accept-language", "en,hr;q=0.9");
request.Headers.Add("accept-encoding", "");
request.Headers.Add("Upgrade-Insecure-Requests", "1");
WebResponse response = request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
string responseFromServer = reader.ReadToEnd();
reader.Close();
response.Close();

C# HttpWebResponse not seeing cookies

I'm trying to log in to Yahoo! using an HttpWebRequest, but i'm having trouble getting the initial cookie that they set. I'm not sure if this is a problem with my Request/Response, if they set the cookie in some nefarious way to prevent this kind of activity.
So here's the first part of my Connect() method, which to start with simply gets the login page, so I get the authentication hidden fields and cookies:
public void Connect()
{
var LoginUrl = "https://login.yahoo.com/config/login";
var cookieContainer = new CookieContainer();
// First get a login page to grab some important values
var request = WebRequest.Create(LoginUrl) as HttpWebRequest;
request.Method = "GET";
request.CookieContainer = cookieContainer;
Console.WriteLine(request.SupportsCookieContainer);
var response = request.GetResponse() as HttpWebResponse; /* LINE:30 */
var loginPageText = string.Empty;
using (var reader = new StreamReader(response.GetResponseStream()))
{
loginPageText = reader.ReadToEnd();
}
}
If I inspect the response object at line 30, I don't even see any Set-Cookie headers. If I visit the same page manually in Chrome, I see the following header being sent back:
Set-Cookie:B=bgg40ppbditpf&b=3&s=4s; expires=Mon, 05-Mar-2018 11:53:19 GMT; path=/; domain=.yahoo.com
What could cause those headers to not appear?

I see no cookies either, but if I fake being a browser:
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36";
response.Cookies[0] is set.

Output of PHP script in C#

The main problem I think is that I am trying to get an output of a php script on an ssl protected website. Why doesn't the following code work?
string URL = "https://mtgox.com/api/0/data/ticker.php";
HttpWebRequest myRequest =
(HttpWebRequest)WebRequest.Create(URL);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader _sr = new StreamReader(myResponse.GetResponseStream(),
System.Text.Encoding.UTF8);
string result = _sr.ReadToEnd();
//Console.WriteLine(result);
result = result.Replace('\n', ' ');
_sr.Close();
myResponse.Close();
Console.WriteLine(result);
It hangs at WebException was unhandeled The operation has timed out

You're hitting the wrong url. ssl is https://, but you're hitting http:// (note the lack of S). The site does redirect to the SSL version of the page, but your code is apparently not following that redirect.

Have added myRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11"; everything started working

C# request a webpage failed but successd using web browser (Well checked the header and cookies)

My friend is using C# to write a simple program for requesting a webpage.
However he encounter a problem when try to request a specified webpage.
He have already tried to set all the header and cookie inside the request, but it still got the timeout exception.
The example webpage is https://my.ooma.com
Here is the code:
string url = "https://my.ooma.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Timeout = 30000;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.52 Safari/536.5";
request.Method = "GET";
request.CookieContainer = new CookieContainer();
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.Headers.Add("Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3");
request.Headers.Add("Accept-Encoding:gzip,deflate,sdch");
request.Headers.Add("Accept-Language:en-US,en;q=0.8");
request.KeepAlive = true;
WebResponse myResponse = request.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream());
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();
All the headers is as same as when using Chrome to browse the webpage.
And he didn't see any cookies set by using the Chrome developer tool.
Do anyone can success request the page using C#?
Thanks a lot.

Sorry for being late.
The following code snippet should work just fine. I also tried with tour old URL that had "getodds.xgi" in it and it also worked fine.
The server uses a secure sockets layer (SSL) protocol for connections that use the Secure Hypertext Transfer Protocol (HTTPS) scheme only.
You don't need to set any UserAgent or Header if they were just intended to get response.
ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3;
WebRequest request = WebRequest.Create("http://my.ooma.com/");
string htmlResponse = string.Empty;
using (WebResponse response = request.GetResponse())
{
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
htmlResponse = reader.ReadToEnd();
reader.Close();
}
response.Close();
}

Weird behaviour of HttpWebRequest while using WebProxy

I have an HttpWebRequest with a StreamReader that works very fine without using a WebProxy. When I use WebProxy, the StreamReader reads strange character instead of the actual html. Here is the code.
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("https://URL");
req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10";
req.Accept = "application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
req.Headers.Add("Accept-Encoding", "gzip,deflate,sdch");
req.Headers.Add("Accept-Language", "en-US,en;q=0.8");
req.Method = "GET";
req.CookieContainer = new CookieContainer();
WebProxy proxy = new WebProxy("proxyIP:proxyPort");
proxy.Credentials = new NetworkCredential("proxyUser", "proxyPass");
req.Proxy = this.proxy;
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
StreamReader reader = new StreamReader(res.GetResponseStream());
string html = reader.ReadToEnd();
Without using the WebProxy, the variable html holds the expected html string from the URL. But with a WebProxy, html holds a value like that:
"�\b\0\0\0\0\0\0��]r����s�Y����\0\tP\"]ki���ػ��-��X�\0\f���/�!�HU���>Cr���P$%�nR�� z�g��3�t�~q3�ٵȋ(M���14&?\r�d:�ex�j��p������.��Y��o�|��ӎu�OO.�����\v]?}�~������E:�b��Lן�Ԙ6+�l���岳�Y��y'ͧ��~#5ϩ�it�2��5��%�p��E�L����t&x0:-�2��i�C���$M��_6��zU�t.J�>C-��GY��k�O�R$�P�T��8+�*]HY\"���$Ō�-�r�ʙ�H3\f8Jd���Q(:�G�E���r���Rܔ�ڨ�����W�<]$����i>8\b�p� �\= 4\f�> �&��$��\v��C��C�vC��x�p�|\"b9�ʤ�\r%i��w#��\t�r�M�� �����!�G�jP�8.D�k�Xʹt�J��/\v!�r��y\f7<����\",\a�/IK���ۚ�r�����ҿ5�;���}h��+Q��IO]�8��c����n�AGڟu2>�

Since you are passing
req.Headers.Add("Accept-Encoding", "gzip,deflate,sdch");
I would say your proxy compress the stream before sending it back to you.
Inspect the headers of the Response to check the encoding.

Just use Gzip to decompress it.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

cannot get page source using c# httpwebrequest - c#

Related

HTTPWebRequest fails when setting CookieContainer

C# HttpWebResponse not seeing cookies

Output of PHP script in C#

C# request a webpage failed but successd using web browser (Well checked the header and cookies)

Weird behaviour of HttpWebRequest while using WebProxy

Categories

Resources