C# WebClient DownloadString returns gibberish - c#

I am attempting to view the source of http://simpledesktops.com/browse/desktops/2012/may/17/where-the-wild-things-are/ using the code:
String URL = "http://simpledesktops.com/browse/desktops/2012/may/17/where-the-wild-things-are/";
WebClient webClient = new WebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");
webClient.Encoding = Encoding.GetEncoding("Windows-1255");
string download = webClient.DownloadString(URL);
webClient.Dispose();
Console.WriteLine(download);
When I run this, the console returns a bunch of nonsense that looks like it's been decoded incorrectly.
I've also attempted adding headers with no avail:
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");
webClient.Headers.Add("Accept-Encoding", "gzip,deflate");
Other websites all returned the proper html source. I can also view the page's source through Chrome. What's going on here?

Response of that URL is gzipped, you should decompress it or set empty Accept-Encoding header, you don't need that user-agent field.
String URL = "http://simpledesktops.com/browse/desktops/2012/may/17/where-the-wild-things-are/";
WebClient webClient = new WebClient();
webClient.Headers.Add("Accept-Encoding", "");
string download = webClient.DownloadString(URL);

I've had the same thing bug me today.
Using a WebClient object to check whether a URL is returning something.
But my experience is different. I tried removing the Accept-Encoding, basically using the code #Antonio Bakula gave in his answer. But I kept getting the same error every time (InvalidOperationException)
So this did not work:
WebClient wc = new WebClient();
wc.Headers.Add("Accept-Encoding", "");
string result = wc.DownloadString(url);
But adding 'any' text as a User Agent instead did do the trick. This worked fine:
WebClient wc = new WebClient();
wc.Headers.Add(HttpRequestHeader.UserAgent, "My User Agent String");
System.IO.Stream stream = wc.OpenRead(url);
Your mileage may vary obviously, also of note. I'm using ASP.NET 4.0.30319.

Related

get method returns root url content

i try to ge the content of this url: https://www.eganba.com/index.php?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1
but as a result of the following code the response contains the content of this url, the home page: https://www.eganba.com
in addition, when i try to get the first url content via Postman application the response is correct.
do you have any idea?
WebRequest request = WebRequest.Create("https://www.eganba.com/index.php?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1");
request.Method = "GET";
request.Headers["X-Requested-With"] = "XMLHttpRequest";
WebResponse response = request.GetResponse();
Use WebClient method which inside System.Net. I think this code gives you what you need. It return the page's html
using (WebClient client = new WebClient())
{
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
client.Headers.Add("accept", "text/html");
var htmlCode = client.DownloadString("https://www.eganba.com/?p=Products&ctg_id=2000&sort_type=rel-desc&view=0&page=1");
var result = htmlCode.Contains("Stokta var") ? true : false;
}
Hope it helps to you.

Output of PHP script in C#

The main problem I think is that I am trying to get an output of a php script on an ssl protected website. Why doesn't the following code work?
string URL = "https://mtgox.com/api/0/data/ticker.php";
HttpWebRequest myRequest =
(HttpWebRequest)WebRequest.Create(URL);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader _sr = new StreamReader(myResponse.GetResponseStream(),
System.Text.Encoding.UTF8);
string result = _sr.ReadToEnd();
//Console.WriteLine(result);
result = result.Replace('\n', ' ');
_sr.Close();
myResponse.Close();
Console.WriteLine(result);
It hangs at WebException was unhandeled The operation has timed out
You're hitting the wrong url. ssl is https://, but you're hitting http:// (note the lack of S). The site does redirect to the SSL version of the page, but your code is apparently not following that redirect.
Have added myRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11"; everything started working

C# WebClient OpenRead

Could someone help with parsing website.
I have parsed lots of sites but this one is interesting, the inner code is generated dynamically with php file. So I tried to use WebClient like this:
WebClient client = new WebClient();
string postData = "getProducts=1&category=340&brand=0";
byte[] byteArray = Encoding.UTF8.GetBytes(postData);
client.Headers.Add("POST", "/ajax.php HTTP/1.1");
client.Headers.Add("Host", site);
client.Headers.Add("Connection", "keep-alive");
client.Headers.Add("Origin", "http://massup.ru");
client.Headers.Add("X-Requested-With", "XMLHttpRequest");
client.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11");
client.Headers.Add("Accept", "*/*");
client.Headers.Add("Content-Type", "application/x-www-form-urlencoded");
client.Headers.Add("Content-length", byteArray.Length.ToString());
client.Headers.Add("Referer", "http://massup.ru/category/proteini");
client.Headers.Add("Accept-Encoding", "gzip,deflate,sdch");
client.Headers.Add("Accept-Language", "ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4");
client.Headers.Add("Accept-Charset", "windows-1251,utf-8;q=0.7,*;q=0.3");
client.Headers.Add("Cookie", "cart=933a71dfee2baf8573dfc2094a937f0d; r_v=YToyOntpOjA7YTo3OntzOjU6Im1vZGVsIjtzOjI2OiIxMDAlIFdoZXkgUHJvdGVpbiA5MDgg0LPRgCI7czozOiJ1cmwiO3M6MzQ6Im11bHRpcG93ZXItMTAwLXdoZXktcHJvdGVpbi05MDgtZ3IiO3M6NToiYnJhbmQiO3M6MTA6Ik11bHRpcG93ZXIiO3M6ODoiY2F0ZWdvcnkiO3M6Mzk6ItCh0YvQstC%2B0YDQvtGC0L7Rh9C90YvQtSDQuNC30L7Qu9GP0YLRiyI7czo5OiJzY2F0ZWdvcnkiO3M6Mzc6ItCh0YvQstC%2B0YDQvtGC0L7Rh9C90YvQuSDQuNC30L7Qu9GP0YIiO3M6NToicHJpY2UiO3M6MToiMCI7czo0OiJpY29uIjtzOjM3OiJodHRwOi8vbWFzc3VwLnJ1L2ltYWdlcy9pY29uXzQ3NTIuanBnIjt9aToxO2E6Nzp7czo1OiJtb2RlbCI7czoxNzoiTWF0cml4IDIuMCA5ODQg0LMiO3M6MzoidXJsIjtzOjE2OiJtYXRyaXgtMi0wLTk4NC1nIjtzOjU6ImJyYW5kIjtzOjc6IlN5bnRyYXgiO3M6ODoiY2F0ZWdvcnkiO3M6Mzk6ItCh0YvQstC%2B0YDQvtGC0L7Rh9C90YvQtSDQuNC30L7Qu9GP0YLRiyI7czo5OiJzY2F0ZWdvcnkiO3M6Mzc6ItCh0YvQstC%2B0YDQvtGC0L7Rh9C90YvQuSDQuNC30L7Qu9GP0YIiO3M6NToicHJpY2UiO3M6NDoiMTE5MCI7czo0OiJpY29uIjtzOjM3OiJodHRwOi8vbWFzc3VwLnJ1L2ltYWdlcy9pY29uXzEwMDguanBnIjt9fQ%3D%3D; PHPSESSID=933a71dfee2baf8573dfc2094a937f0d");
Stream data = client.OpenRead("http://massup.ru/ajax.php");
StreamReader reader = new StreamReader(data);
string s = reader.ReadToEnd();
Console.WriteLine(s);
data.Close();
reader.Close();
But it gives me an error!
Could someone help me with this kind of parsing.
See my answer to C# https login and download file, which has working code that correctly handles HTTP POSTs, then clean up what you have based on it. After you've done that, if you still need help, post your updated code and a clearer description of what specific errors or exceptions you are seeing.

Cant access site using webclient method..?

I am making a desktop yellowpage application. I can access all countries yellowpage site but not australian site. I dont know why?
Here is the code
class Program
{
static void Main(string[] args)
{
WebClient wb = new WebClient();
wb.Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)");
string html = wb.DownloadString("http://www.yellowpages.com.au");
Console.WriteLine(html);
}
}
For all other site I get html of the website for australian site I get null. i even tried httpwebrequest also.
Here is the yellowpage australian site: http://www.yellowpages.com.au
Thanks in advance
It looks like that website will only send over gzip'ed data. Try switching to HttpWebRequest and using auto decompression:
var request = (HttpWebRequest)WebRequest.Create("http://www.yellowpages.com.au");
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705;)";
request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
In addition to #bkaid's correct (and upvoted) answer, you can use your own class inherited from WebClient to uncompress/handle gzip compressed html:
public class GZipWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = (HttpWebRequest)base.GetWebRequest(address);
request.AutomaticDecompression = DecompressionMethods.GZip |
DecompressionMethods.Deflate;
return request;
}
}
Having done this, the following works just fine:
WebClient wb = new GZipWebClient();
string html = wb.DownloadString("http://www.yellowpages.com.au");
When I view the transfer from that website in Wireshark, it says it's a malformed HTTP packet. It says it uses chunked transfer, then says the following chunk has 0 bytes and then sends the code of the website. That's why WebClient returns an empty string (not null). And I think it's correct behavior.
It seems browsers ignore this error and so they can display the page properly.
EDIT:
As bkaid pointed out, the server seems to handle send correct gziped response. The following code works for me:
WebClient wb = new WebClient();
wb.Headers.Add("Accept-Encoding", "gzip");
string html;
using (var webStream = wb.OpenRead("http://www.yellowpages.com.au"))
using (var gzipStream = new GZipStream(webStream, CompressionMode.Decompress))
using (var streamReader = new StreamReader(gzipStream))
html = streamReader.ReadToEnd();

Getting html source from url, css inline problem!

I have a strange problem:
I am getting the html source from url using this:
string html;
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(Url);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
html = reader.ReadToEnd();
reader.Close();
}
response.Close();
}
The page that I am requesting has css inline like this:
<span class="VL" style="display:inline-block;height:20px;width:0px;"></span>
But the html var value has only:
<span class="VL" style="display:inline-block;"></span>
Anyone knows why? I have tested with many enconders and using WebRequest and WebClient too, but doesn't work too.
You might need to send a User Agent so that the site doesn't think that you are a bot. Some sites don't bother with CSS when requested from bots. Also the reading of the remote HTML could be simplified using a WebClient:
using (var client = new WebClient())
{
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4";
string html = client.DownloadString(url);
}
Are you viewing the source through a browser development tool, by clicking inspect element? Is it possible you are viewing the source from a browser which is adding the height and width attributes on the client side through JavaScript and showing you the modified style.

Categories