I'm using below script to retrieve HTML from an URL.
string webURL = #"https://nl.wiktionary.org/wiki/" + word.ToLower();
using (WebClient client = new WebClient())
{
string htmlCode = client.DownloadString(webURL);
}
The variable word can be any word. In case there is no WIKI page for the "word" be retrieved the code is ending in error with code 404, while retrievng the URL with a browser opens a WIKI page, saying there is no page for this item yet.
What I want is that the code always gets the HTML, also when the WIKI page says there is no info yet. I do not want to avoid the error 404 with a try and catch.
Does anyone has an idea why this is not working with a Webclient?
try this. You can catch the 404 error content in a try catch block.
var word = Console.ReadLine();
string webURL = #"https://nl.wiktionary.org/wiki/" + word.ToLower();
using (WebClient client = new WebClient() { })
{
try
{
string htmlCode = client.DownloadString(webURL);
}
catch (WebException exception)
{
string responseText=string.Empty;
var responseStream = exception.Response?.GetResponseStream();
if (responseStream != null)
{
using (var reader = new StreamReader(responseStream))
{
responseText = reader.ReadToEnd();
}
}
Console.WriteLine(responseText);
}
}
Console.ReadLine();
Since this WIKI-server use case-sensitive url mapping, just don't modify case of URL to harvest (remove ".ToLower()" from you code).
Ex.:
Lower case:
https://nl.wiktionary.org/wiki/categorie:onderwerpen_in_het_nynorsk
Result: HTTP 404(Not Found)
Normal (unmodified) case:
https://nl.wiktionary.org/wiki/Categorie:Onderwerpen_in_het_Nynorsk
Result: HTTP 200(OK)
Also, keep in mind what most (if not all) WiKi servers (including this one) generates custom 404 pages, so in browser they looks like "normal" pages, but, despite this, they are serving with 404 http code.
Related
Hey guys,
I have a problem with my code. Since about a week my code is not working anymore without any changes. I am pretty sure, that my could should work. All I get is Error 404: forbidden.
Below is a snippet of my Code. I also read about adding a header of the webclient, which did not help. Any other suggestions? I am sorry if my syntax is not that good, it is my first post on stackoverflow.
Thanks in advance!
string epicId = "ManuelNotManni";
WebClient webClient = new WebClient();
Uri uri = new Uri("https://api.tracker.gg/api/v2/rocket-league/standard/profile/epic/");
string result = String.Empty;
try
{
string website = $"{uri.ToString()}{epicId}?";
result = webClient.DownloadString(website);
}
catch (Exception ex)
{
Console.WriteLine($"Error:\n{ex}");
Console.ReadLine();
}
finally
{
webClient.Dispose();
}
This is the exact error:
System.Net.WebException: The remote server returned an error: (403) Forbidden.
at System.Net.HttpWebRequest.GetResponse()
at System.Net.WebClient.GetWebResponse(WebRequest request)
at System.Net.WebClient.DownloadBits(WebRequest request, Stream writeStream)
at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request)
at System.Net.WebClient.DownloadString(Uri address)
at System.Net.WebClient.DownloadString(String address)
at TestProject.Program.Main(String[] args) in > C:\Users\Manue\source\repos\TestProject\Program.cs:line 17
You're right. Your code should work fine.
Issue is that URL you're requesting which is actually:
https://api.tracker.gg/api/v2/rocket-league/standard/profile/epic/ManuelNotManni?
This returns a 403 status code in any case - no matter if you use a browser, your code or for example postman.
I suggest to have a look at the response body while using postman.
It shows this
<html class="no-js" lang="en-US">
<!--<![endif]-->
<head>
<title>Attention Required! | Cloudflare</title>
<meta name="captcha-bypass" id="captcha-bypass" />
Tracker.gg wants API users to register their apps with them before they're given access to the API.
What you need to do is to first head to their Getting Started page. Here you will have to create an app, which should give you an authentication key.
When you have done this, you want to change your code slightly to add the Authentication Header. Like so for example:
var webClient = new WebClient();
webclient.Headers.Add("TRN-Api-Key", "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")
As a sidenote, WebClient has been deprecated and it's recommended to use HttpClient from now on. Here's your code with HttpClient instead:
var epicId = "ManuelNotManni";
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Add("TRN-Api-Key", "YOUR API KEY GOES HERE");
// Simplifying Uri creation:
var uri = new Uri($"https://api.tracker.gg/api/v2/rocket-league/standard/profile/epic/{epicId}");
var result = string.Empty; // C# prefers lowercase string
try
{
var response = await httpClient.GetAsync(uri);
if (response.IsSuccessStatusCode)
{
result = await response.Content.ReadAsStringAsync();
}
else
{
Console.WriteLine($"Unable to retrieve data for {epicId}.");
Console.WriteLine($"Statuscode: {response.StatusCode}");
Console.WriteLine($"Reason: {response.ReasonPhrase}");
}
}
catch (Exception ex)
{
Console.WriteLine($"Error:\n{ex}");
Console.ReadLine();
}
finally
{
httpClient.Dispose();
}
This happens when we violate the Firewall rule set by Cloudflare, you can visit this blog for more details.
https://community.cloudflare.com/t/community-tip-fixing-error-1020-access-denied/66439
I have a Website like this http://www.lfp.fr/ligue1/feuille_match/52255 and i want to switches between the tabs infoMatch and Statistiques but it shows me the Data of 1st page only and when i use the firebug to check the reponse it gives me this:
GET showStatsJoueursMatchmatchId=52255&domId=112&extId=24&live=0&domNomClub=AJ+Auxerre&extNomClub=FC+Nantes
string url="http://www.lfp.fr/ligue1/feuille_match/52255";
string getData = "?matchId=52255&domId=112&extId=24&live=0&domNomClub=AJ+Auxerre&extNomClub=FC+Nantes";
System.Uri uriObj = new System.Uri(url);
String Methode = "GET";
lgRequest = (HttpWebRequest)WebRequest.CreateDefault(uriObj);
lgRequest = (HttpWebRequest)WebRequest.CreateDefault(uriObj);
lgRequest.Method = Methode;
lgRequest.ContentType = "text/html";
SetRequestHeader("Accept", "text/html");
SetRequestHeader("Cache-Control", "no-cache");
SetRequestHeader("Content-Length", getData.Length.ToString());
StreamWriter stream = new StreamWriter
(lgRequest.GetRequestStream(), Encoding.ASCII);
stream.Write(body);
stream.Close();
lgResponse = (HttpWebResponse)lgRequest.GetResponse();
But it gives me the error "Cannot send a content-body with this verb-type." And when i use the "POST" in Method, it gives the Response of HTML but only the First Page Data not Statistiques.
Try at the following address: http://www.lfp.fr/ligue1/feuille_match/showStatsJoueursMatch?matchId=52255&domId=112&extId=24&live=0&domNomClub=AJ+Auxerre&extNomClub=FC+Nantes
Just like that:
using System;
using System.Net;
class Program
{
static void Main()
{
using (var client = new WebClient())
{
string result = client.DownloadString("http://www.lfp.fr/ligue1/feuille_match/showStatsJoueursMatch?matchId=52255&domId=112&extId=24&live=0&domNomClub=AJ+Auxerre&extNomClub=FC+Nantes");
Console.WriteLine(result);
}
}
}
Notice that I have used a WebClient instead of WebRequest which makes the code much shorter and easier to understand.
Once you have downloaded the HTML from the remote site you might consider using an HTML parsing library such as HTML Agility Pack to extract the useful information from the markup you have scraped.
Does anyone know how to check if a webpage is asking for HTTP Authentication via C# using the WebRequest class? I'm not asking how to post Credentials to the page, just how to check if the page is asking for Authentication.
Current Snippet to get HTML:
WebRequest wrq = WebRequest.Create(address);
wrs = wrq.GetResponse();
Uri uri = wrs.ResponseUri;
StreamReader strdr = new StreamReader(wrs.GetResponseStream());
string html = strdr.ReadToEnd();
wrs.Close();
strdr.Close();
return html;
PHP Server side source:
<?php
if (!isset($_SERVER['PHP_AUTH_USER'])) {
header('WWW-Authenticate: Basic realm="Secure Sign-in"');
header('HTTP/1.0 401 Unauthorized');
echo 'Text to send if user hits Cancel button';
exit;
} else {
echo "<p>Hello {$_SERVER['PHP_AUTH_USER']}.</p>";
echo "<p>You entered {$_SERVER['PHP_AUTH_PW']} as your password.</p>";
}
?>
WebRequest.GetResponse returns an object of type HttpWebResponse. Just cast it and you can retrieve StatusCode.
However, .Net will give you an exception if it receives a response of status 4xx or 5xx (thanks for your feedback).
There is a little workaround, check it out:
HttpWebRequest wrq = (HttpWebRequest)WebRequest.Create(#"http://webstrand.comoj.com/locked/safe.php");
HttpWebResponse wrs = null;
try
{
wrs = (HttpWebResponse)wrq.GetResponse();
}
catch (System.Net.WebException protocolError)
{
if (((HttpWebResponse)protocolError.Response).StatusCode == HttpStatusCode.Unauthorized)
{
//do something
}
}
catch (System.Exception generalError)
{
//run to the hills
}
if (wrs.StatusCode == HttpStatusCode.OK)
{
Uri uri = wrs.ResponseUri;
StreamReader strdr = new StreamReader(wrs.GetResponseStream());
string html = strdr.ReadToEnd();
wrs.Close();
strdr.Close();
}
Hope this helps.
Regards
Might want to try
WebClient wc = new WebClient();
CredentialCache credCache = new CredentialCache();
If you can work with WebClient instead of WebRequest, you should it's a bit higher level, easier to handle headers etc.
Also, might want to check this thread:
System.Net.WebClient fails weirdly
I'm trying to access the last.fm APIs via C#. As a first test I'm querying similar artists if that matters.
I get an XML response when I pass a correct artist name, i.e. "Nirvana". My problem is when I deliver an invalid name (i.e. "Nirvana23") I don't receive XML but an error code (403 or 400) and a WebException.
Interesting thing: If I enter the URL inside a browser (tested with Firefox and Chrome) I receive the XML file I want (containing a lastfm specific error message).
I tried both XmlReader and XDocument:
XDocument doc = XDocument.Load(requestUrl);
and HttpWebRequest:
string httpResponse = "";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(requestUrl);
HttpWebResponse response = null;
StreamReader reader = null;
try
{
response = (HttpWebResponse)request.GetResponse();
reader = new StreamReader(response.GetResponseStream());
httpResponse = reader.ReadToEnd();
}
The URL is something like "http://ws.audioscrobbler.com/2.0/?method=artist.getsimilar&artist=Nirvana23" (and a specific key given by lastfm, but even without it - it should return XML). A link to give it a try: link (this is the error file I cannot access via C#).
What I also tried (without success): comparing the request by both the browser and my program with the help of WireShark. Then I added some headers to the request, but that didn't help either.
In .NET the WebRequest is converting HTTP error codes into exceptions, while your browser is just ignoring them since the response is not empty. If you catch the exception then the GetResponseStream method should still return the error XML that you are expecting.
Edit:
Try this:
string httpResponse = "";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(requestUrl);
WebResponse response = null;
StreamReader reader = null;
try
{
response = request.GetResponse();
}
catch (WebException ex)
{
response = ex.Response;
}
reader = new StreamReader(response.GetResponseStream());
httpResponse = reader.ReadToEnd();
Why don't you catch the exception and then process that accordingly. If you want to display any custom error, you can do that also in your catch block.
I'm querying Wikipedia using the following code, but I always get an error (403 forbidden). When I type the exact same url in my browser, however, it works. I've been using the same code before to query other web apis, so I am not sure what's causing the trouble.
private static string query(string text)
{
text = text.Replace(" ", "%20");
string url = "http://en.wikipedia.org/w/api.php?action=opensearch&search=" + text + "&format=json&callback=spellcheck";
WebClient client = new WebClient();
client.Headers.Add("User-Agent", "whatever"); // <-- this line was missing
try
{
string response = client.DownloadString(url);
return response;
}
catch(Exception e)
{
Console.WriteLine(e.Message);
return null;
}
}
Try setting the user agent header to something that matches your browser. If this doesn't work, fire up Fiddler, take a peek at your browser headers and copy them to your web request.
http://msdn.microsoft.com/en-us/library/system.net.webclient.headers.aspx
EDIT
The advice I gave was generic. Please observe the policies of the website you are downloading from, as spoofing a browser user-agent may contravene policy or be considered malicious by default:
http://meta.wikimedia.org/wiki/User-Agent_policy :
Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.