Check the validity of a weblink - c#

I have a winform I'm using to connect to server via the use of a php script held online. Ive made it so my program can store this address within the settings of the winform itself like so:
http://server.webhost.com/file/uploadimage.html
Then when to pass this address to my program I simply call the following:
Settings.Default.ServerAddress;
Then to send my file to the server I have the following method which calls looks like this:
UploadToServer.HttpUploadFile(Settings.Default.ServerAddress , sfd.FileName.ToString(), "file", "image/jpeg", nvc);
However I have no idea of how to check to make sure that the address entered is actually valid. Is there a best practice to achieve that?

One of the way to make sure that a URL is working is to actually request it for content, you can make it better by placing a request of type HEAD only. Like
try
{
HttpWebRequest request = HttpWebRequest.Create("yoururl") as HttpWebRequest;
request.Method = "HEAD"; //Get only the header information -- no need to download any content
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
{
int statusCode = (int)response.StatusCode;
if (statusCode >= 100 && statusCode < 400) //Good requests
{
}
else //if (statusCode >= 500 && statusCode <= 510) //Server Errors
{
//Hard to reach here since an exception would be thrown
}
}
}
catch (WebException ex)
{
//handle exception
//something wrong with the url
}

Use System.Uri (http://msdn.microsoft.com/en-us/library/system.uri.aspx) to parse it. You'll get an exception if it's not "valid". But as the other's who commented state, depending what kind of "valid" you want, this may or may not be good enough for what you are doing.

Related

How to validate existence of OneDrive document via its Share link?

My C#/UWP app has a section where users can enter links to OneDrive documents and other web resources as reference information. A user can click a button to test a link after they've entered it to make sure it launches as expected. I need to validate that the link target exists before launching the URI and raise an error if the link is not valid before attempting to launch the URI.
It's straight-forward to validate web sites and non-OneDrive web-based docs by creating a HttpWebRequest using the URL and evaluating the response's status value. (See sample code below.)
However, OneDrive document share links seem to have problems with this approach, returning a [405 Method Not Allowed] error. I'm guessing this is because OneDrive share links do lots of forwarding and redirection before they get to the actual document.
try
{
// Create the HTTP request
HttpWebRequest request = WebRequest.Create(urlString) as HttpWebRequest;
request.Timeout = 5000; // Set the timeout to 5 seconds -- don't keep the user waiting too long!
request.Method = "HEAD"; // Get only the header information -- no need to download page content
// Get the HTTP response
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
{
// Get the response status code
int statusCode = (int)response.StatusCode;
// Successful request...return true
if (statusCode >= 100 && statusCode < 400)
{
return true;
}
// Unsuccessful request (server-side errors)...return false
else // if (statusCode >= 500 && statusCode < 600)
{
Log.Error( $"URL not valid. Server error. (Status = {statusCode})" );
return false;
}
}
}
// Handle HTTP exceptions
catch (WebException e)
{
// Get the entire HTTP response from the exception
HttpWebResponse response = (HttpWebResponse)e.Response;
// Grab the HTTP status code from the response
int statusCode = (int)response.StatusCode;
// Unsuccessful request (client-side errors)...return false
if (statusCode >= 400 && statusCode <= 499)
{
Log.Error( $"URL not valid. Client error. (Status = {statusCode})" );
return false;
}
else // Unhandled errors
{
Log.Error( $"Unhandled status returned for URL. (Status = {e.Status})" );
return false;
}
}
// Handle non-HTTP exceptions
catch (Exception e)
{
Log.Error( $"Unexpected error. Could not validate URL." );
return false;
}
I can trap the 405 error and launch the URL anyhow using the Windows.System.Launcher.LaunchUriAsync method. The OneDrive link launches just fine...IF the OneDrive document actually exists.
But if the document doesn't exist, or if the share permissions have been revoked, I end up with a browser page with something like a [404 Not Found] error...exactly what I'm trying to avoid by doing the validation!
Is there a way to validate OneDrive share links WITHOUT actually launching them in a browser? Are there other types of links (bit.ly links, perhaps?) that also create problems in the same way? Perhaps a better question: Can I validate ALL web resources in the same way without knowing anything but the URL?
The best way to avoid the redirects and get access to an item metadata using a sharing link is to make an API call to the shares endpoint. You'll want to encode your URL as outlined here and the pass it to the API like:
HEAD https://api.onedrive.com/v1.0/shares/u!{encodedurl}

How to ckeck web page or exists or Not in .net...?

I just want know the web page is connected or not without using WebResponse class becuase if i use this class its taking time to get repsonse. So i just want without using like this below code
Dim url As New System.Uri("http://www.stackoverflow.com/")
Dim request As WebRequest = WebRequest.CreateDefault(url)
request.Method = "GET"
Dim response As WebResponse
Try
response = request.GetResponse()
Catch exc As WebException
response = exc.Response
End Try
You can't without using the proper classes for it, or writing your own.
My two cents: just execute the HttpWebRequest and check if the resulting HTTP status code is not 404:
try
{
HttpWebRequest q = (HttpWebRequest)WebRequest.Create(theUrl);
HttpWebResponse r = (HttpWebResponse)q.GetResponse();
if (r.StatusCode != HttpStatusCode.NotFound)
{
// page does not exist
}
}
catch (WebException ex)
{
HttpWebResponse r = ex.Response as HttpWebResponse;
if (r != null && r.StatusCode != HttpStatusCode.NotFound)
{
// page does not exist
}
}
You could create a basic socket connection to the given server and the desired port (80). If you can connect, you know that the server is online and you can immediatly close the connection without sending or receiving any data.
EDIT: My answer was of course not really correct. By connecting to the server on port 80 just verfiys that the server accepts request and not if the specific web page exists. But after connecting you could send a GET request like GET /page.html HTTP/1.1 and parse the answer of the server. But for this it will be much more comfortable to use WebRequest or WebClient.

Use httpwebrequest to check if url exists

I'm using a function to check if an external url exists. Here's the code with the status messages removed for clarity.
public static bool VerifyUrl(string url)
{
url.ThrowNullOrEmpty("url");
if (!(url.StartsWith("http://") || url.StartsWith("https://")))
return false;
var uri = new Uri(url);
var webRequest = HttpWebRequest.Create(uri);
webRequest.Timeout = 5000;
webRequest.Method = "HEAD";
HttpWebResponse webResponse;
try
{
webResponse = (HttpWebResponse)webRequest.GetResponse();
webResponse.Close();
}
catch (WebException)
{
return false;
}
if (string.Compare(uri.Host, webResponse.ResponseUri.Host, true) != 0)
{
string responseUri = webResponse.ResponseUri.ToString().ToLower();
if (responseUri.IndexOf("error") > -1 || responseUri.IndexOf("404.") > -1 || responseUri.IndexOf("500.") > -1)
return false;
}
return true;
}
I've run a test over some external urls and found that about 20 out of 100 are coming back as errors. If i add a user agent the errors are around 14%.
The errors coming back are "forbidden", although this can be resolved for 6% using a user agent, "service unavialable", "method not allowed", "not implemented" or "connection closed".
Is there anything I can do to my code to ensure more, preferrably all give a valid response to their existance?
Altermatively, code that can be purchased to do this more effectively.
UPDATE - 14th Nov 12 ----------------------------------------------------------------------
After following advice from previous respondants, I'm now in a situation where I have a single domain that returns Service Unavailable (503). The example I have is www.marksandspencer.com.
When I use this httpsniffer web-sniffer.net as opposed to the one recommended in this thread, it works, returning the data using a webrequest.GET, however I can't work out what I need to do, to make it work in my code.
I finally got to the point of bieng able to validate all the urls without exception.
Firstly I took Davios advice. Some domains return an error on Request.HEAD so I have included a retry for specific scenarios. This created a new Request.GET for the second request.
Secondly, the Amazon scenario. Amazon was intermittently returning a 503 error for its own site and permanent 503 errors for sites hosted on the Amazon framework.
After some digging I found adding the following line to the Request resolved both. It is the Accept string used by Firefox.
var request = (HttpWebRequest)HttpWebRequest.Create(uri);
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";

series of httpWebRequests

I'm currently writing a simple app that performs a series of requests to the web server and I've encountered a strange... feature?
I don't need response stream of the request, but only status code. So, for each piece of my data I call my own "Send" method:
public static int Send(string uri)
{
HttpWebRequest request = null;
HttpWebResponse response = null;
try
{
request = (HttpWebRequest)WebRequest.Create(uri);
response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK) return 0;
}
catch (Exception e)
{
if (request != null) request.Abort();
}
return -1;
}
Works fine? Yes, unless I call this function at least twice. Second call of such a function in a row (with the same uri) will ALWAYS result in timeout.
Now, that's odd: if I add request.Abort(); when I return zero (here, when status code is 200) - everything ALWAYS works fine.
So my question is - why? Is it some kind of framework restriction, or maybe the some kind of anti-DOS protection on the particular server (unfortunately, the server is a black box for me)? Or maybe I just don't understand smth in how it all works?
Try to dispose of the web response, you may leak some resources
public static int Send(string uri)
{
HttpWebRequest request = null;
try
{
request = (HttpWebRequest)WebRequest.Create(uri);
using (var response = (HttpWebResponse)request.GetResponse())
{
if (response.StatusCode == HttpStatusCode.OK) return 0;
}
}
catch (Exception e)
{
if (request != null) request.Abort();
}
return -1;
}
There is also a default number of connections (2 I think, but you can configure this) you can make to a domain simultaneously, please see this SO question. You're probably hitting this limit with your unclosed responses.
First of all I'd make a series of changes in order to get to the root of this:
take out that try..catch{} (you're likely swallowing an exception)
return a boolean instead of a number
You should then get your exception information you need.
Also you should be using "HEAD" as your method as you only want the status code:
request.Method = "HEAD";
read the difference here.

I want to check whether the file in a url entered exists or not using .net

I am developing a tool for validation of links in url entered. suppose i have entered a url
(e.g http://www-review-k6.thinkcentral.com/content/hsp/science/hspscience/na/gr3/se_9780153722271_/content/nlsg3_006.html
) in textbox1 and i want to check whether the contents of all the links exists on remote server or not. finally i want a log file for the broken links.
You can use HttpWebRequest.
Note four things
1) The webRequest will throw exception if the link doesn't exist
2) You may like to disable auto redirect
3) You may also like to check if it's a valid url. If not, it will throw UriFormatException.
UPDATED
4) Per Paige suggested , Use "Head" in request.Method so that it won't download the whole remote file
static bool UrlExists(string url)
{
try
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.Method = "HEAD";
request.AllowAutoRedirect = false;
request.GetResponse();
}
catch (UriFormatException)
{
// Invalid Url
return false;
}
catch (WebException ex)
{
// Valid Url but not exists
HttpWebResponse webResponse = (HttpWebResponse)ex.Response;
if (webResponse.StatusCode == HttpStatusCode.NotFound)
{
return false;
}
}
return true;
}
Use the HttpWebResponse class:
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("http://www.gooogle.com/");
HttpWebResponse response = (HttpWebResponse)webRequest.GetResponse();
if (response.StatusCode == HttpStatusCode.NotFound)
{
// do something
}
bool LinkExist(string link)
{
HttpWebRequest webRequest = (HttpWebRequest) webRequest.Create(link);
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
return !(webResponse.StatusCode != HttpStatusCode.NotFound);
}
Use an HTTP HEAD request as explained in this article: http://www.eggheadcafe.com/tutorials/aspnet/2c13cafc-be1c-4dd8-9129-f82f59991517/the-lowly-http-head-reque.aspx
Make a HTTP request to the URL and see if you get a 404 response. If so then it does not exist.
Do you need a code example?
If your goal is robust validation of page source, consider usign a tool that is already written, like the W3C Link Checker. It can be run as a command-line program that handles finding links, pictures, css, etc and checking them for validity. It can also recursively check an entire web-site.

Categories