I need to check if the url content type is pdf or not? I have a working code however i was wondering what's the best way to check from what i have. I don't need to display the pdf, just need to check if the content type is pdf or not?
Note: This method will be called multiple times with different url, so i am not sure if i need to close the response or not.
here is my code.
private bool IsValid(string url)
{
bool isValid = false;
var request = (HttpWebRequest)WebRequest.Create(url);
var response = (HttpWebResponse)request.GetResponse();
if(response.StatusCode == HttpStatusCode.OK && response.ContentType == "application/pdf")
{
isValid = true;
}
response.Close();
return isValid;
}
Yes as you don't pass response anywhere you need to dispose it. You should also catch WebException and dispose stream from there too (also I would expect disposing response or even request would close all related resource, but unfortunately I never seen documentation that confirms such cascading dispose behavior for Response object).
You also need to close/dispose the request as it is one-use object. It is specified in note of GetResponse:
Multiple calls to GetResponse return the same response object; the request is not reissued.
Side note: Consider making HEAD request so you don't get any stream at all (see Method property for usage).
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "HEAD";
Related
I am playing around with an app using HttpWebRequest to dialog with a web server.
I followed standard instructions I found on the web to build my request function that I tried to make as generic as possible (I try to get a unique method regardless of the method: PUT, POST, DELETE, REPORT, ...)
When I submit a "REPORT" request, I get two access logs on my server:
1) I get response 401 after following line is launched in debugger
reqStream.Write(Encoding.UTF8.GetBytes(body), 0, body.Length);
2) I get response 207 (multi-get, which is what I expect) after passing the line calling Request.GetResponse();
Actually, it seems to be the Request.GetRequestStream() line that is querying the server the first time, but the request is only committed once passing the reqStream.Write(...) line...
Same for PUT and DELETE, the Request.GetRequestStream() again generates a 401 access log on my server whereas the Request.GetResponse(); returns code 204.
I don't understand why for a unique request I have two server access logs, especially one that seems to be doing nothing as it always returns code 401... Could anybody explain what is going on? Is it a flaw in my code or a bad design due to my attempt to get a generic code for multiple methods?
Here is my full code:
public static HttpWebResponse getHttpWebRequest(string url, string usrname, string pwd, string method, string contentType,
string[] headers, string body) {
// Variables.
HttpWebRequest Request;
HttpWebResponse Response;
//
string strSrcURI = url.Trim();
string strBody = body.Trim();
try {
// Create the HttpWebRequest object.
Request = (HttpWebRequest)HttpWebRequest.Create(strSrcURI);
// Add the network credentials to the request.
Request.Credentials = new NetworkCredential(usrname.Trim(), pwd);
// Specify the method.
Request.Method = method.Trim();
// request headers
foreach (string s in headers) {
Request.Headers.Add(s);
}
// Set the content type header.
Request.ContentType = contentType.Trim();
// set the body of the request...
Request.ContentLength = body.Length;
using (Stream reqStream = Request.GetRequestStream()) {
// Write the string to the destination as a text file.
reqStream.Write(Encoding.UTF8.GetBytes(body), 0, body.Length);
reqStream.Close();
}
// Send the method request and get the response from the server.
Response = (HttpWebResponse)Request.GetResponse();
// return the response to be handled by calling method...
return Response;
}
catch (Exception e) {
throw new Exception("Web API error: " + e.Message, e);
}
I've just noticed some behavior in C# that's thrown me off a little. I'm using C# 5 and .NET 4.5. When I call GetResponseStream() on a HTTPResponse object I am able to get the response stream, but if I call it again on the same object the response is blank.
// Works! Body of the response is in the source variable.
HttpResponse response = (HttpWebResponse)request.GetResponse();
String source = new StreamReader(response.GetResponseStream()).ReadToEnd();
// Does Not Work. Source is empty;
String source2 = new StreamReader(response.GetResponseStream()).ReadToEnd();
The above is just an example to demonstrate the problem.
Edit
This is what I'm trying to do. Basically if an event is attached to the HTTP object it will pass a response back to the callback method.
HttpWebResponse public Get(String url)
{
// HttpWebRequest ...
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// postRequest is an event handler. The response is passed to the
// callback to do whatever it needs to do.
if (this.postRequest != null)
{
RequestEventArgs requestArgs = new RequestEventArgs();
requestArgs.source = response;
postRequest.Invoke(this, requestArgs);
}
return response;
}
In the callback method I may want to check the body of the response. If I do, I lose the the data from the response when Get() returns the response.
The response stream reads directly from the network connection.
Once you read it to the end (in the 2nd line), there is no more data to read.
I need to get the url of the final destination of a shortened url. At the moment I am doing the following which seems to work:
var request = WebRequest.Create(shortenedUri);
var response = request.GetResponse();
return response.ResponseUri;
But can anyone suggest a better way?
If this shortened url is generated by some online service provider it is only this service provider that is storing the mapping between the short and the actual url. So you need to query this provider by sending an HTTP request to it, exactly as you did. Also don't forget to properly dispose IDisposable resources by wrapping them in using statements:
var request = WebRequest.Create(shortenedUri);
using (var response = request.GetResponse())
{
return response.ResponseUri;
}
If the service provider supports the HEAD verb you could also use this verb and read the Location response HTTP header which must be pointing to the actual url. As an alternative you could set the AllowAutoRedirect property to false on the request object and then read the Location response HTTP header. This way the client won't be redirecting to the actual resource and getting the entire response body when you are not interested in it.
Of course the best way to do this would be if your online service provider offers an API that would allow you to directly give you the actual url from a short url.
You do need to make an HTTP request - but you don't need to follow the redirect, which WebRequest will do by default. Here's a short example of making just one request:
using System;
using System.Net;
class Test
{
static void Main()
{
string url = "http://tinyurl.com/so-hints";
Console.WriteLine(LengthenUrl(url));
}
static string LengthenUrl(string url)
{
var request = WebRequest.CreateHttp(url);
request.AllowAutoRedirect = false;
using (var response = request.GetResponse())
{
var status = ((HttpWebResponse) response).StatusCode;
if (status == HttpStatusCode.Moved ||
status == HttpStatusCode.MovedPermanently)
{
return response.Headers["Location"];
}
// TODO: Work out a better exception
throw new Exception("No redirect required.");
}
}
}
Note that this means if the "lengthened" URL is itself a redirect, you won't get the "final" URI as you would in your original code. Likewise if the lengthened URL is invalid, you won't spot that - you'll just get the URL that you would have redirected to. Whether that's a good thing or not depends on your use case...
I am using WebClient to get some info from a page that is sometimes not available [302 Moved Temporarily]. So i want to program to detect whether the page exists
I tried to override the WebClient WebResponse with the following code to only return the page when it's status is OK but it did not worked.
protected override WebResponse GetWebResponse(WebRequest request)
{
var response = base.GetWebResponse(request);
if (response is HttpWebResponse)
return (response as HttpWebResponse).StatusCode == HttpStatusCode.OK ? response : null;
return null;
}
when i used my overriden class to get the page (when it is unavailable) it just redirected and did not returned null
Get code
private async Task<string> Get(string uri)
{
return await Handler.DownloadStringTaskAsync(new Uri(uri));
}
[WHAT I WANT TO ACHIEVE] : i want the web client tried to get the page but it was not found so it has been redirected to another page.
WebClient will follow redirects automatically by default (up to a maximum number).
If you override GetWebRequest to modify the returned HttpWebRequest, setting its AllowAutoRedirect property to false, then I believe it will just give you back the 302 directly - although possibly via an exception...
This won't tell you the status but it can be inferred by the fact that you were redirected.
if(reponse.ResponseUri != request.RequestUri) {
// if you really want to know the status
// set AllowAutoRedirect = false;
// and send another request in here.
}
I am developing a tool for validation of links in url entered. suppose i have entered a url
(e.g http://www-review-k6.thinkcentral.com/content/hsp/science/hspscience/na/gr3/se_9780153722271_/content/nlsg3_006.html
) in textbox1 and i want to check whether the contents of all the links exists on remote server or not. finally i want a log file for the broken links.
You can use HttpWebRequest.
Note four things
1) The webRequest will throw exception if the link doesn't exist
2) You may like to disable auto redirect
3) You may also like to check if it's a valid url. If not, it will throw UriFormatException.
UPDATED
4) Per Paige suggested , Use "Head" in request.Method so that it won't download the whole remote file
static bool UrlExists(string url)
{
try
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.Method = "HEAD";
request.AllowAutoRedirect = false;
request.GetResponse();
}
catch (UriFormatException)
{
// Invalid Url
return false;
}
catch (WebException ex)
{
// Valid Url but not exists
HttpWebResponse webResponse = (HttpWebResponse)ex.Response;
if (webResponse.StatusCode == HttpStatusCode.NotFound)
{
return false;
}
}
return true;
}
Use the HttpWebResponse class:
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("http://www.gooogle.com/");
HttpWebResponse response = (HttpWebResponse)webRequest.GetResponse();
if (response.StatusCode == HttpStatusCode.NotFound)
{
// do something
}
bool LinkExist(string link)
{
HttpWebRequest webRequest = (HttpWebRequest) webRequest.Create(link);
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
return !(webResponse.StatusCode != HttpStatusCode.NotFound);
}
Use an HTTP HEAD request as explained in this article: http://www.eggheadcafe.com/tutorials/aspnet/2c13cafc-be1c-4dd8-9129-f82f59991517/the-lowly-http-head-reque.aspx
Make a HTTP request to the URL and see if you get a 404 response. If so then it does not exist.
Do you need a code example?
If your goal is robust validation of page source, consider usign a tool that is already written, like the W3C Link Checker. It can be run as a command-line program that handles finding links, pictures, css, etc and checking them for validity. It can also recursively check an entire web-site.