C# How can I check if a URL exists/is valid?

C# How can I check if a URL exists/is valid? - c#

I am making a simple program in visual c# 2005 that looks up a stock symbol on Yahoo! Finance, downloads the historical data, and then plots the price history for the specified ticker symbol.
I know the exact URL that I need to acquire the data, and if the user inputs an existing ticker symbol (or at least one with data on Yahoo! Finance) it works perfectly fine. However, I have a run-time error if the user makes up a ticker symbol, as the program tries to pull data from a non-existent web page.
I am using the WebClient class, and using the DownloadString function. I looked through all the other member functions of the WebClient class, but didn't see anything I could use to test a URL.
How can I do this?

Here is another implementation of this solution:
using System.Net;
///
/// Checks the file exists or not.
///
/// The URL of the remote file.
/// True : If the file exits, False if file not exists
private bool RemoteFileExists(string url)
{
try
{
//Creating the HttpWebRequest
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
//Setting the Request method HEAD, you can also use GET too.
request.Method = "HEAD";
//Getting the Web Response.
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
//Returns TRUE if the Status code == 200
response.Close();
return (response.StatusCode == HttpStatusCode.OK);
}
catch
{
//Any exception will returns false.
return false;
}
}
From: http://www.dotnetthoughts.net/2009/10/14/how-to-check-remote-file-exists-using-c/

You could issue a "HEAD" request rather than a "GET"?
So to test a URL without the cost of downloading the content:
// using MyClient from linked post
using(var client = new MyClient()) {
client.HeadOnly = true;
// fine, no content downloaded
string s1 = client.DownloadString("http://google.com");
// throws 404
string s2 = client.DownloadString("http://google.com/silly");
}
You would try/catch around the DownloadString to check for errors; no error? It exists...
With C# 2.0 (VS2005):
private bool headOnly;
public bool HeadOnly {
get {return headOnly;}
set {headOnly = value;}
}
and
using(WebClient client = new MyClient())
{
// code as before
}

These solutions are pretty good, but they are forgetting that there may be other status codes than 200 OK. This is a solution that I've used on production environments for status monitoring and such.
If there is a url redirect or some other condition on the target page, the return will be true using this method. Also, GetResponse() will throw an exception and hence you will not get a StatusCode for it. You need to trap the exception and check for a ProtocolError.
Any 400 or 500 status code will return false. All others return true.
This code is easily modified to suit your needs for specific status codes.
/// <summary>
/// This method will check a url to see that it does not return server or protocol errors
/// </summary>
/// <param name="url">The path to check</param>
/// <returns></returns>
public bool UrlIsValid(string url)
{
try
{
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
request.Timeout = 5000; //set the timeout to 5 seconds to keep the user from waiting too long for the page to load
request.Method = "HEAD"; //Get only the header information -- no need to download any content
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
{
int statusCode = (int)response.StatusCode;
if (statusCode >= 100 && statusCode < 400) //Good requests
{
return true;
}
else if (statusCode >= 500 && statusCode <= 510) //Server Errors
{
//log.Warn(String.Format("The remote server has thrown an internal error. Url is not valid: {0}", url));
Debug.WriteLine(String.Format("The remote server has thrown an internal error. Url is not valid: {0}", url));
return false;
}
}
}
catch (WebException ex)
{
if (ex.Status == WebExceptionStatus.ProtocolError) //400 errors
{
return false;
}
else
{
log.Warn(String.Format("Unhandled status [{0}] returned for url: {1}", ex.Status, url), ex);
}
}
catch (Exception ex)
{
log.Error(String.Format("Could not test url {0}.", url), ex);
}
return false;
}

If I understand your question correctly, you could use a small method like this to give you the results of your URL test:
WebRequest webRequest = WebRequest.Create(url);
WebResponse webResponse;
try
{
webResponse = webRequest.GetResponse();
}
catch //If exception thrown then couldn't get response from address
{
return 0;
}
return 1;
You could wrap the above code in a method and use it to perform validation. I hope this answers the question you were asking.

Try this (Make sure you use System.Net):
public bool checkWebsite(string URL) {
try {
WebClient wc = new WebClient();
string HTMLSource = wc.DownloadString(URL);
return true;
}
catch (Exception) {
return false;
}
}
When the checkWebsite() function gets called, it tries to get the source code of
the URL passed into it. If it gets the source code, it returns true. If not,
it returns false.
Code Example:
//The checkWebsite command will return true:
bool websiteExists = this.checkWebsite("https://www.google.com");
//The checkWebsite command will return false:
bool websiteExists = this.checkWebsite("https://www.thisisnotarealwebsite.com/fakepage.html");

I have always found Exceptions are much slower to be handled.
Perhaps a less intensive way would yeild a better, faster, result?
public bool IsValidUri(Uri uri)
{
using (HttpClient Client = new HttpClient())
{
HttpResponseMessage result = Client.GetAsync(uri).Result;
HttpStatusCode StatusCode = result.StatusCode;
switch (StatusCode)
{
case HttpStatusCode.Accepted:
return true;
case HttpStatusCode.OK:
return true;
default:
return false;
}
}
}
Then just use:
IsValidUri(new Uri("http://www.google.com/censorship_algorithm"));

A lot of the answers are older than HttpClient (I think it was introduced in Visual Studio 2013) or without async/await functionality, so I decided to post my own solution:
private static async Task<bool> DoesUrlExists(String url)
{
try
{
using (HttpClient client = new HttpClient())
{
//Do only Head request to avoid download full file
var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Head, url));
if (response.IsSuccessStatusCode) {
//Url is available is we have a SuccessStatusCode
return true;
}
return false;
}
} catch {
return false;
}
}
I use HttpClient.SendAsync with HttpMethod.Head to make only a head request, and not downlaod the whole file. Like David and Marc already say there is not only http 200 for ok, so I use IsSuccessStatusCode to allow all Sucess Status codes.

WebRequest request = WebRequest.Create("http://www.google.com");
try
{
request.GetResponse();
}
catch //If exception thrown then couldn't get response from address
{
MessageBox.Show("The URL is incorrect");`
}

This solution seems easy to follow:
public static bool isValidURL(string url) {
WebRequest webRequest = WebRequest.Create(url);
WebResponse webResponse;
try
{
webResponse = webRequest.GetResponse();
}
catch //If exception thrown then couldn't get response from address
{
return false ;
}
return true ;
}

Here is another option
public static bool UrlIsValid(string url)
{
bool br = false;
try {
IPHostEntry ipHost = Dns.Resolve(url);
br = true;
}
catch (SocketException se) {
br = false;
}
return br;
}

A lot of other answers are using WebRequest which is now obsolete.
Here is a method that has minimal code and uses currently up-to-date classes and methods.
I have also tested the other most up-voted functions which can produce false positives.
I tested with these URLs, which points to the Visual Studio Community Installer, found on this page.
//Valid URL
https://aka.ms/vs/17/release/vs_community.exe
//Invalid URL, redirects. Produces false positive on other methods.
https://aka.ms/vs/14/release/vs_community.exe
using System.Net;
using System.Net.Http;
//HttpClient is not meant to be created and disposed frequently.
//Declare it staticly in the class to be reused.
static HttpClient client = new HttpClient();
/// <summary>
/// Checks if a remote file at the <paramref name="url"/> exists, and if access is not restricted.
/// </summary>
/// <param name="url">URL to a remote file.</param>
/// <returns>True if the file at the <paramref name="url"/> is able to be downloaded, false if the file does not exist, or if the file is restricted.</returns>
public static bool IsRemoteFileAvailable(string url)
{
//Checking if URI is well formed is optional
Uri uri = new Uri(url);
if (!uri.IsWellFormedOriginalString())
return false;
try
{
using (HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Head, uri))
using (HttpResponseMessage response = client.Send(request))
{
return response.IsSuccessStatusCode && response.Content.Headers.ContentLength > 0;
}
}
catch
{
return false;
}
}
Just note that this will not work with .NET Framework, as HttpClient.Send does not exist.
To get it working on .NET Framework you will need to change client.Send(request) to client.SendAsync(request).Result.

Web servers respond with a HTTP status code indicating the outcome of the request e.g. 200 (sometimes 202) means success, 404 - not found etc (see here). Assuming the server address part of the URL is correct and you are not getting a socket timeout, the exception is most likely telling you the HTTP status code was other than 200. I would suggest checking the class of the exception and seeing if the exception carries the HTTP status code.
IIRC - The call in question throws a WebException or a descendant. Check the class name to see which one and wrap the call in a try block to trap the condition.

i have a more simple way to determine weather a url is valid.
if (Uri.IsWellFormedUriString(uriString, UriKind.RelativeOrAbsolute))
{
//...
}

Following on from the examples already given, I'd say, it's best practice to also wrap the response in a using like this
public bool IsValidUrl(string url)
{
try
{
var request = WebRequest.Create(url);
request.Timeout = 5000;
request.Method = "HEAD";
using (var response = (HttpWebResponse)request.GetResponse())
{
response.Close();
return response.StatusCode == HttpStatusCode.OK;
}
}
catch (Exception exception)
{
return false;
}
}

Related

Status code 301 not showing correctly in C#

I am able to get numbers with enum as suggested by dtb in Getting Http Status code number (200, 301, 404, etc.) from HttpWebRequest and HttpWebResponse. However, for moved permanently site also i am getting 200 (OK). What I want to see is 301 instead. Please help. My code is below. What could be wrong/needs to be corrected?
public int GetHeaders(string url)
{
//HttpStatusCode result = default(HttpStatusCode);
int result = 0;
var request = HttpWebRequest.Create(url);
request.Method = "HEAD";
try
{
using (var response = request.GetResponse() as HttpWebResponse)
{
if (response != null)
{
result = (int)response.StatusCode; // response.StatusCode;
response.Close();
}
}
}
catch (WebException we)
{
if (we.Response != null)
{
result = (int)((HttpWebResponse)we.Response).StatusCode;
}
}
return result;
}
The tool where i am using this code is capable of showing 404, not existing domains but it is ignoring the redirects and shows the details about the redirected url. e.g if i put my older domain easytipsandtricks.com in the text field, it shows the results for tipscow.com (if you check easytipsandtricks.com in any checker tool online, you will notice that it is giving the correct redirect message - 301 Moved). Please help.

You need to set HttpWebRequest.AllowAutoRedirect to false (default is true) for it to not automatically follow redirects (30x responses).
If AllowAutoRedirect is set to false, all responses with an HTTP status code from 300 to 399 is returned to the application.
Sample:
var request = (HttpWebRequest)HttpWebRequest.Create(url);
request.Method = "HEAD";
request.AllowAutoRedirect = false;

Sending a http request in C# and catching network issues

I previously had a small VBScript that would test if a specific website was accessible by sending a GET request. The script itself was extremely simple and did everything I needed:
Function GETRequest(URL) 'Sends a GET http request to a specific URL
Dim objHttpRequest
Set objHttpRequest = CreateObject("MSXML2.XMLHTTP.3.0")
objHttpRequest.Open "GET", URL, False
On Error Resume Next 'Error checking in case access is denied
objHttpRequest.Send
GETRequest = objHttpRequest.Status
End Function
I now want to include this sort of functionality in an expanded C# application. However I've been unable to get the same results my previous script provided.
Using code similar to what I've posted below sort of gets me a proper result, but fails to run if my network connection has failed.
public static void GETRequest()
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://url");
request.Method = "GET";
HttpStatusCode status;
HttpWebResponse response;
try
{
response = (HttpWebResponse)request.GetResponse();
status = response.StatusCode;
Console.WriteLine((int)response.StatusCode);
Console.WriteLine(status);
}
catch (WebException e)
{
status = ((HttpWebResponse)e.Response).StatusCode;
Console.WriteLine(status);
}
}
But as I said, I need to know if the site is accessible, not matter the reason: the portal could be down, or the problem might reside on the side of the PC that's trying to access it. Either way: I don't care.
When I used MSXML2.XMLHTTP.3.0 in the script I was able to get values ranging from 12000 to 12156 if I was having network problems. I would like to have the same functionality in my C# app, that way I could at least write a minimum of information to a log and let the computer act accordingly. Any ideas?

A direct translation of your code would be something like this:
static void GetStatusCode(string url)
{
dynamic httpRequest = Activator.CreateInstance(Type.GetTypeFromProgID("MSXML2.XMLHTTP.3.0"));
httpRequest.Open("GET", url, false);
try { httpRequest.Send(); }
catch { }
finally { Console.WriteLine(httpRequest.Status); }
}
It's as small and simple as your VBScript script, and uses the same COM object to send the request.
This code happily gives me error code like 12029 ERROR_WINHTTP_CANNOT_CONNECT or 12007 ERROR_WINHTTP_NAME_NOT_RESOLVED etc.

If the code is failing only when you don't have an available network connection, you can use GetIsNetworkAvailable() before executing your code. This method will return a boolean indicating if a network connection is available or not. If it returns false, you could execute an early return / notify the user, and if not, continue.
System.Net.NetworkInformation.NetworkInterface.GetIsNetworkAvailable()
using the code you provided above:
public static void GETRequest()
{
if (!System.Net.NetworkInformation.NetworkInterface.GetIsNetworkAvailable())
return; //or alert the user there is no connection
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://url");
request.Method = "GET";
HttpStatusCode status;
HttpWebResponse response;
try
{
response = (HttpWebResponse)request.GetResponse();
status = response.StatusCode;
Console.WriteLine((int)response.StatusCode);
Console.WriteLine(status);
}
catch (WebException e)
{
status = ((HttpWebResponse)e.Response).StatusCode;
Console.WriteLine(status);
}
}

This should work for you, i've used it many times before, cut it down a bit for your needs: -
private static string GetStatusCode(string url)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Method = WebRequestMethods.Http.Get;
req.ProtocolVersion = HttpVersion.Version11;
req.UserAgent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
try
{
HttpWebResponse response = (HttpWebResponse)req.GetResponse();
StringBuilder sb = new StringBuilder();
foreach (string header in response.Headers)
{
sb.AppendLine(string.Format("{0}: {1}", header, response.GetResponseHeader(header)));
}
return string.Format("Response Status Code: {0}\nServer:{1}\nProtocol: {2}\nRequest Method: {3}\n\n***Headers***\n\n{4}", response.StatusCode,response.Server, response.ProtocolVersion, response.Method, sb);
}
catch (Exception e)
{
return string.Format("Error: {0}", e.ToString());
}
}
Feel free to ignore the section that gets the headers

series of httpWebRequests

I'm currently writing a simple app that performs a series of requests to the web server and I've encountered a strange... feature?
I don't need response stream of the request, but only status code. So, for each piece of my data I call my own "Send" method:
public static int Send(string uri)
{
HttpWebRequest request = null;
HttpWebResponse response = null;
try
{
request = (HttpWebRequest)WebRequest.Create(uri);
response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK) return 0;
}
catch (Exception e)
{
if (request != null) request.Abort();
}
return -1;
}
Works fine? Yes, unless I call this function at least twice. Second call of such a function in a row (with the same uri) will ALWAYS result in timeout.
Now, that's odd: if I add request.Abort(); when I return zero (here, when status code is 200) - everything ALWAYS works fine.
So my question is - why? Is it some kind of framework restriction, or maybe the some kind of anti-DOS protection on the particular server (unfortunately, the server is a black box for me)? Or maybe I just don't understand smth in how it all works?

Try to dispose of the web response, you may leak some resources
public static int Send(string uri)
{
HttpWebRequest request = null;
try
{
request = (HttpWebRequest)WebRequest.Create(uri);
using (var response = (HttpWebResponse)request.GetResponse())
{
if (response.StatusCode == HttpStatusCode.OK) return 0;
}
}
catch (Exception e)
{
if (request != null) request.Abort();
}
return -1;
}
There is also a default number of connections (2 I think, but you can configure this) you can make to a domain simultaneously, please see this SO question. You're probably hitting this limit with your unclosed responses.

First of all I'd make a series of changes in order to get to the root of this:
take out that try..catch{} (you're likely swallowing an exception)
return a boolean instead of a number
You should then get your exception information you need.
Also you should be using "HEAD" as your method as you only want the status code:
request.Method = "HEAD";
read the difference here.

WebRequest Exception in .NET

I use this code snippet that verifies if the file specified in the URL exists and keep trying it every few seconds for every user. Sometimes (mostly when there are large number of users using the site) the code doesn't work.
[WebMethod()]
public static string GetStatus(string URL)
{
bool completed = false;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
try
{
if (response.StatusCode == HttpStatusCode.OK)
{
completed = true;
}
}
catch (Exception)
{
//Just don't do anything. Retry after few seconds
}
}
return completed.ToString();
}
When I look at the Windows Event logs there are several errors:
Unable to read data from the transport connection. An existing connection was forcibly closed
The Operation has timed out
The remote host closed the connection. The error code is 0x800703E3
When I restart the IIS, things work fine until the next time this happens.

You are putting the try/catch inside the using statement while it's the request.GetResponse method that might throw:
bool completed = false;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL);
try
{
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
if (response.StatusCode == HttpStatusCode.OK)
{
completed = true;
}
}
}
catch (Exception)
{
//Just don't do anything. Retry after few seconds
}
return completed.ToString();

Is there a faster way to check if an external web page exists?

I wrote this method to check if a page exists or not:
protected bool PageExists(string url)
{
try
{
Uri u = new Uri(url);
WebRequest w = WebRequest.Create(u);
w.Method = WebRequestMethods.Http.Head;
using (StreamReader s = new StreamReader(w.GetResponse().GetResponseStream()))
{
return (s.ReadToEnd().Length >= 0);
}
}
catch
{
return false;
}
}
I am using it to check a set of pages (iterates from AAAA-AAAZ), and it takes between 3 and 7 seconds to run the entire loop. Is there a faster or more efficient way to do this?

I think your approach is rather good, but would change it into only downloading the headers by adding w.Method = WebRequestMethods.Http.Head; before calling GetResponse.
This could do it:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.example.com");
request.Method = WebRequestMethods.Http.Head;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
bool pageExists = response.StatusCode == HttpStatusCode.OK;
You may probably want to check for other status codes as well.

static bool GetCheck(string address)
{
try
{
HttpWebRequest request = WebRequest.Create(address) as HttpWebRequest;
request.Method = "GET";
request.CachePolicy = new RequestCachePolicy(RequestCacheLevel.NoCacheNoStore);
var response = request.GetResponse();
return (response.Headers.Count > 0);
}
catch
{
return false;
}
}
static bool HeadCheck(string address)
{
try
{
HttpWebRequest request = WebRequest.Create(address) as HttpWebRequest;
request.Method = "HEAD";
request.CachePolicy = new RequestCachePolicy(RequestCacheLevel.NoCacheNoStore);
var response = request.GetResponse();
return (response.Headers.Count > 0);
}
catch
{
return false;
}
}
Beware, certain pages (eg. WCF .svc files) may not return anything from a head request. I know because I'm working around this right now.
EDIT - I know there are better ways to check the return data than counting headers, but this is a copy/paste from stuff where this is important to us.

One obvious speedup is to run several requests in parallel - most of the time will be spent on IO, so spawning 10 threads to each check a page will complete the whole iteration around 10 times faster.

You could do it using asynchronous way, because now you are waiting for results after each request. For few pages, you could just throw your function in ThreadPool, and wait for all requests to finish. For more requests, you could use asynchronous methods for your ResponseStream() (BeginRead etc.).
The other thing that can help you (help me for sure) is to clear .Proxy property:
w.Proxy = null;
Without this, at least 1st request is much slower, at least on my machine.
3. You can not download whole page, but download only header, by setting .Method to "HEAD".

I simply used Fredrik Mörk answer above but placed it within a method:
private bool checkURL(string url)
{
bool pageExists = false;
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = WebRequestMethods.Http.Head;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
pageExists = response.StatusCode == HttpStatusCode.OK;
}
catch (Exception e)
{
//Do what ever you want when its no working...
//Response.Write( e.ToString());
}
return pageExists;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# How can I check if a URL exists/is valid? - c#

WebRequest request = WebRequest.Create("http://www.google.com"); try { request.GetResponse(); } catch //If exception thrown then couldn't get response from address { MessageBox.Show("The URL is incorrect");` }

Here is another option public static bool UrlIsValid(string url) { bool br = false; try { IPHostEntry ipHost = Dns.Resolve(url); br = true; } catch (SocketException se) { br = false; } return br; }

i have a more simple way to determine weather a url is valid. if (Uri.IsWellFormedUriString(uriString, UriKind.RelativeOrAbsolute)) { //... }

Related

Status code 301 not showing correctly in C#

Sending a http request in C# and catching network issues

series of httpWebRequests

WebRequest Exception in .NET

Is there a faster way to check if an external web page exists?

Categories

Resources