Cant access site using webclient method..? - c#

I am making a desktop yellowpage application. I can access all countries yellowpage site but not australian site. I dont know why?
Here is the code
class Program
{
static void Main(string[] args)
{
WebClient wb = new WebClient();
wb.Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)");
string html = wb.DownloadString("http://www.yellowpages.com.au");
Console.WriteLine(html);
}
}
For all other site I get html of the website for australian site I get null. i even tried httpwebrequest also.
Here is the yellowpage australian site: http://www.yellowpages.com.au
Thanks in advance

It looks like that website will only send over gzip'ed data. Try switching to HttpWebRequest and using auto decompression:
var request = (HttpWebRequest)WebRequest.Create("http://www.yellowpages.com.au");
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705;)";
request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

In addition to #bkaid's correct (and upvoted) answer, you can use your own class inherited from WebClient to uncompress/handle gzip compressed html:
public class GZipWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = (HttpWebRequest)base.GetWebRequest(address);
request.AutomaticDecompression = DecompressionMethods.GZip |
DecompressionMethods.Deflate;
return request;
}
}
Having done this, the following works just fine:
WebClient wb = new GZipWebClient();
string html = wb.DownloadString("http://www.yellowpages.com.au");

When I view the transfer from that website in Wireshark, it says it's a malformed HTTP packet. It says it uses chunked transfer, then says the following chunk has 0 bytes and then sends the code of the website. That's why WebClient returns an empty string (not null). And I think it's correct behavior.
It seems browsers ignore this error and so they can display the page properly.
EDIT:
As bkaid pointed out, the server seems to handle send correct gziped response. The following code works for me:
WebClient wb = new WebClient();
wb.Headers.Add("Accept-Encoding", "gzip");
string html;
using (var webStream = wb.OpenRead("http://www.yellowpages.com.au"))
using (var gzipStream = new GZipStream(webStream, CompressionMode.Decompress))
using (var streamReader = new StreamReader(gzipStream))
html = streamReader.ReadToEnd();

Related

Downloading JSON with WebClient results in weird unicode-like characters?

So I can make a request in my browser to,
https://search.snapchat.com/lookupStory?id=itsmaxwyatt
and it will give me back JSON, but if I do it via web client, it seems to give me back a very obfuscated string? I can provide it all, but have truncated for now:
�x��ƽ���������o�Cj񦌁�_�����˗��89:�/�[��/� h��#l���ٗC��U.�gH�,����qOv�_� �_����σҭ
So, here is the Csharp code:
using var webClient = new WebClient();
webClient.Headers.Add ("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:89.0) Gecko/20100101 Firefox/89.0");
webClient.Headers.Add("Host", "search.snapchat.com");
webClient.DownloadString("https://search.snapchat.com/lookupStory?id=itsmaxwyatt")
I have also tried in a http rest client without any headers, and it still returns JSON.
Tried with encoding:
using var webClient = new WebClient();
webClient.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";
webClient.Encoding = Encoding.UTF8;
webClient.Headers.Add ("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:89.0) Gecko/20100101 Firefox/89.0");
webClient.Headers.Add("Host", "search.snapchat.com");
Console.WriteLine(Encoding.UTF8.GetString(webClient.DownloadData("https://search.snapchat.com/lookupStory?id=itsmaxwyatt")));
Following #Progman comment, all you need is to do the following:
// You can define other methods, fields, classes and namespaces here
class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return request;
}
}
void Main()
{
using var webClient = new MyWebClient();
webClient.Headers.Add("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:89.0) Gecko/20100101 Firefox/89.0");
webClient.Headers.Add("Host", "search.snapchat.com");
var str = webClient.DownloadString("https://search.snapchat.com/lookupStory?id=itsmaxwyatt");
Debug.WriteLine(str);
}

get method returns root url content

i try to ge the content of this url: https://www.eganba.com/index.php?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1
but as a result of the following code the response contains the content of this url, the home page: https://www.eganba.com
in addition, when i try to get the first url content via Postman application the response is correct.
do you have any idea?
WebRequest request = WebRequest.Create("https://www.eganba.com/index.php?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1");
request.Method = "GET";
request.Headers["X-Requested-With"] = "XMLHttpRequest";
WebResponse response = request.GetResponse();
Use WebClient method which inside System.Net. I think this code gives you what you need. It return the page's html
using (WebClient client = new WebClient())
{
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
client.Headers.Add("accept", "text/html");
var htmlCode = client.DownloadString("https://www.eganba.com/?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1");
var result = htmlCode.Contains("Stokta var") ? true : false;
}
Hope it helps to you.

C# WebClient DownloadString returns gibberish

I am attempting to view the source of http://simpledesktops.com/browse/desktops/2012/may/17/where-the-wild-things-are/ using the code:
String URL = "http://simpledesktops.com/browse/desktops/2012/may/17/where-the-wild-things-are/";
WebClient webClient = new WebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");
webClient.Encoding = Encoding.GetEncoding("Windows-1255");
string download = webClient.DownloadString(URL);
webClient.Dispose();
Console.WriteLine(download);
When I run this, the console returns a bunch of nonsense that looks like it's been decoded incorrectly.
I've also attempted adding headers with no avail:
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");
webClient.Headers.Add("Accept-Encoding", "gzip,deflate");
Other websites all returned the proper html source. I can also view the page's source through Chrome. What's going on here?
Response of that URL is gzipped, you should decompress it or set empty Accept-Encoding header, you don't need that user-agent field.
String URL = "http://simpledesktops.com/browse/desktops/2012/may/17/where-the-wild-things-are/";
WebClient webClient = new WebClient();
webClient.Headers.Add("Accept-Encoding", "");
string download = webClient.DownloadString(URL);
I've had the same thing bug me today.
Using a WebClient object to check whether a URL is returning something.
But my experience is different. I tried removing the Accept-Encoding, basically using the code #Antonio Bakula gave in his answer. But I kept getting the same error every time (InvalidOperationException)
So this did not work:
WebClient wc = new WebClient();
wc.Headers.Add("Accept-Encoding", "");
string result = wc.DownloadString(url);
But adding 'any' text as a User Agent instead did do the trick. This worked fine:
WebClient wc = new WebClient();
wc.Headers.Add(HttpRequestHeader.UserAgent, "My User Agent String");
System.IO.Stream stream = wc.OpenRead(url);
Your mileage may vary obviously, also of note. I'm using ASP.NET 4.0.30319.

C# request a webpage failed but successd using web browser (Well checked the header and cookies)

My friend is using C# to write a simple program for requesting a webpage.
However he encounter a problem when try to request a specified webpage.
He have already tried to set all the header and cookie inside the request, but it still got the timeout exception.
The example webpage is https://my.ooma.com
Here is the code:
string url = "https://my.ooma.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Timeout = 30000;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.52 Safari/536.5";
request.Method = "GET";
request.CookieContainer = new CookieContainer();
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.Headers.Add("Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3");
request.Headers.Add("Accept-Encoding:gzip,deflate,sdch");
request.Headers.Add("Accept-Language:en-US,en;q=0.8");
request.KeepAlive = true;
WebResponse myResponse = request.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream());
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();
All the headers is as same as when using Chrome to browse the webpage.
And he didn't see any cookies set by using the Chrome developer tool.
Do anyone can success request the page using C#?
Thanks a lot.
Sorry for being late.
The following code snippet should work just fine. I also tried with tour old URL that had "getodds.xgi" in it and it also worked fine.
The server uses a secure sockets layer (SSL) protocol for connections that use the Secure Hypertext Transfer Protocol (HTTPS) scheme only.
You don't need to set any UserAgent or Header if they were just intended to get response.
ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3;
WebRequest request = WebRequest.Create("http://my.ooma.com/");
string htmlResponse = string.Empty;
using (WebResponse response = request.GetResponse())
{
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
htmlResponse = reader.ReadToEnd();
reader.Close();
}
response.Close();
}

Trouble with getting web page's HTML code from my C# program

The problem:
I want to scrap some data from certain webpage (I have administrative access) and to store some information in db for later analysis.
Sounds easy, right?
I've decided to make simple console prototype and code look something like this:
string uri = #"http://s7.iqstreaming.com:8044/admin.cgi";
HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
if(request == null)
{
Console.WriteLine(":( This shouldn't happen!");
Console.ReadKey();
}
request.ContentType = #"text/html";
request.Accept = #"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
request.Credentials = new NetworkCredential("myID", "myPass");
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
StreamReader reader = new StreamReader( response.GetResponseStream());
while (!reader.EndOfStream)
{
Console.WriteLine(reader.ReadLine());
}
reader.Close();
response.Close();
}
This code works on most other sites, but here I get errors 404 (most of the time), 502 or timeout.
I've consulted with Firebug (I've took Accept and compression info from there) but to no avail.
Using Win-forms and webBrowser control as an alternative is not an option (at least for now).
P.S.
Same thing happens when I try to get HTML from http://s7.iqstreaming.com:8044/index.html (doesn't need credentials).
I think the problem is related with User-Agent.
This may solve it
request.UserAgent="Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.78 Safari/535.11";

Categories