I've tested the code below locally using my MVC Web application, it works fine locally but always returns a blank string when testing on my live web server, I've tried different UserAgent values but with no success, I've also setup a windows forms app on the web server and tested the code but it downloads fine. I'm thinking it maybe some setting within my web.config file but I have very little understanding of how the web.config file works.
public class Web
{
private const string UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0";
public static string DownloadPage(string url)
{
var client = new WebClient();
client.Encoding = System.Text.Encoding.UTF8;
client.Headers[HttpRequestHeader.UserAgent] = UserAgent;
return client.DownloadString(url);
}
}
Are you sure you don't need to authenticate?
You don't need the UserAgent.
Try it like this, this is how i got it working. You can now at least see the exception it'll throw.
public string RequestUrl(string reqUrl)
{
WebClient client = new WebClient();
try
{
return client.DownloadString(reqUrl);
}
catch (Exception e)
{
return "" + e;
}
}
Related
I'd like to create a tool to check if an url is valid (valid: it returns a 200). I have two examples of check in pages of airlines, and both works correctly in the browser. However the British Airlines always throws an exception becuase of a 500 response. What is wrong with my code?
static void Main(string[] args)
{
var testUrl1 = new Program().UrlIsValid("https://www.klm.com/ams/checkin/web/kl/nl/nl");
var testUrl2 = new Program().UrlIsValid("https://www.britishairways.com/travel/olcilandingpageauthreq/public/en_gb");
Console.WriteLine(testUrl1 + "\t - https://www.klm.com/ams/checkin/web/kl/nl/nl");
Console.WriteLine(testUrl2 + "\t - https://www.britishairways.com/travel/olcilandingpageauthreq/public/en_gb");
}
public bool UrlIsValid(string onlineCheckInUrl)
{
try
{
var request = (HttpWebRequest)WebRequest.Create(onlineCheckInUrl);
request.Method = "GET";
var response = (HttpWebResponse)request.GetResponse();
return (response.StatusCode == HttpStatusCode.OK);
}
catch (Exception e)
{
return false;
}
}
A lot of sites block obvious bot activity. The British Airways url you show works for me if I set a valid User-Agent request header:
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:87.0) Gecko/20100101 Firefox/87.0";
Keep in mind that 200 OK is not the only response that means the URL is valid and your method of testing will always be unreliable at best. You may have to narrow your definition of what a valid URL means or at least expect things to change on a site-by-site basis.
I am using the habbo api to check if a name is valid. I'm receiving a 401 Unauthorised Error.
Below is the code I'm using. It worked when I copied my Cookie header in chrome and added that as a header. But is there another way and an actual fix?
private void Form1_Load(object sender, EventArgs e)
{
try
{
using(WebClient WebClient = new WebClient())
{
WebClient.Headers.Add("User-Agent", "Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30");
MessageBox.Show(WebClient.DownloadString("https://www.habbo.com/api/user/avatars/check-name?name=123"));
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
The API https://www.habbo.com/api/user/avatars/check-name what you are referring wont load without proper authorization token, since that one is not public available.
To further test, use the public API https://www.habbo.com/api/public/users?name=
You will be able to get response without any issues.
The 401 error indicates that you need to add (basic) authentication to your HTTP request (and remove the cookie that you added):
String username = “username”;
String password = “password”;
String credentials = Convert.ToBase64String(Encoding.ASCII.GetBytes(username + “:” + password));
WebClient.Headers[HttpRequestHeader.Authorization] = “Basic ” + credentials;
I'm trying to download a file from the server . While passing a link to the download File , it is throwing an error
**URI Formats are not supported ** and pointing at "link "- string contains server file address
string link =
http:\\www.nse-india.com\DERIVATIVES\2012\AUG\fo22AUG2012bhav.csv.zip
WebClient wc = new WebClient();
var ua = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)";
wc.Headers.Add(HttpRequestHeader.UserAgent, ua);
wc.Headers["Accept"] = "/";
and downloading code goes like this
try
{
wc.DownloadFile(#link, "H:\\ZipTest\\ZipText\\nt.zip"); // Here Showing error
_status = true;
fileCount++;
} catch (Exception ex)
{
MessageBox.Show(ex.Message);
_status = false;
}
if i used the same address in the web browser it downloading properly or if a replace some other files then also i can download from the same code only for this particular file i am facing problem , any idea??
Url need little modification
Change
string link =
"http:\www.nse-india.com\DERIVATIVES\2012\AUG\fo22AUG2012bhav.csv.zip"
To
string link =
"http://www.nse-india.com/DERIVATIVES/2012/AUG/fo22AUG2012bhav.csv.zip"
Your url is corrupted: http:\www.nse-indi........ It should be something like http://www.nse-india.com/DERIVATIVES/2012/AUG/fo22AUG2012bhav.csv.zip
I am attempting to view the source of http://simpledesktops.com/browse/desktops/2012/may/17/where-the-wild-things-are/ using the code:
String URL = "http://simpledesktops.com/browse/desktops/2012/may/17/where-the-wild-things-are/";
WebClient webClient = new WebClient();
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");
webClient.Encoding = Encoding.GetEncoding("Windows-1255");
string download = webClient.DownloadString(URL);
webClient.Dispose();
Console.WriteLine(download);
When I run this, the console returns a bunch of nonsense that looks like it's been decoded incorrectly.
I've also attempted adding headers with no avail:
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");
webClient.Headers.Add("Accept-Encoding", "gzip,deflate");
Other websites all returned the proper html source. I can also view the page's source through Chrome. What's going on here?
Response of that URL is gzipped, you should decompress it or set empty Accept-Encoding header, you don't need that user-agent field.
String URL = "http://simpledesktops.com/browse/desktops/2012/may/17/where-the-wild-things-are/";
WebClient webClient = new WebClient();
webClient.Headers.Add("Accept-Encoding", "");
string download = webClient.DownloadString(URL);
I've had the same thing bug me today.
Using a WebClient object to check whether a URL is returning something.
But my experience is different. I tried removing the Accept-Encoding, basically using the code #Antonio Bakula gave in his answer. But I kept getting the same error every time (InvalidOperationException)
So this did not work:
WebClient wc = new WebClient();
wc.Headers.Add("Accept-Encoding", "");
string result = wc.DownloadString(url);
But adding 'any' text as a User Agent instead did do the trick. This worked fine:
WebClient wc = new WebClient();
wc.Headers.Add(HttpRequestHeader.UserAgent, "My User Agent String");
System.IO.Stream stream = wc.OpenRead(url);
Your mileage may vary obviously, also of note. I'm using ASP.NET 4.0.30319.
I am tring to screen scrape a page of a web app that just contains text and is hosted by a 3rd party. It's not a properly formed HTML page, however the text that is diplayed will tell us if the web app is up or down.
When I try to scrape the sreen it returns an error when it tries the WebRequest. The error is "The remote server returned an error: (500) Internal Server Error."
public void ScrapeScreen()
{
try
{
var url = textBox1.Text;
var request = WebRequest.Create(url);
var response = request.GetResponse();
var stream = response.GetResponseStream();
var reader = new StreamReader(stream);
var result = reader.ReadToEnd();
stream.Dispose();
reader.Dispose();
richTextBox1.Text = result;
}
catch(Exception ex)
{
MessageBox.Show(ex.Message);
}
}
Any ideas how I can get the text from the page?
Some sites don't like the default UserAgent. Consider changing it to something real, like:
((HttpWebRequest)request).UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4"
First, try this:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
However, if you're just looking for text and not having to do any POST-ing of data to the server, you may want to look at the webClient class. It more closely resembles a real browser, and takes care of a lot of HTTP header stuff that you may end up having to twek if you stick with the HttpWebRequest class.