Detect that a PHP link causes a file download in c# - c#

I can't seem to find a working answer to my problem and I wonder if someone out there can help. Basically I have a link on my website which causes a zip file to be downloaded:
http://***.com/download.php?id=1
If you activate this link on the web page it will bring up a save as dialog and lets you save the file with default name ThisIsMyZipFile.zip
My problem is that under c# If I use new Uri("http://***.com/download.pgp?id=1").IsFile it returns false so I cannot seem to detect that this is a file without performing a webclient DownloadString and seeing if the first two bytes are PK.
Also even if manually download as a string detect the PK header and save the file I cannot find out what my web site wants to use as the default filename being ThisIsMyZipFile.zip in this example as I want to use the same filename.
Does anyone know a nice way of solving these two problems please?
UPDATE
Thanks to Paul and his answer I created the following function which does exactly what I need:
/// <summary>
/// Returns the responded HTTP headers of the given URL and if the link refers to the file it returns extra information about it.
/// </summary>
/// <param name="Url">The address.</param>
/// <returns>
/// null if a WebException is thrown
/// otherwise:
/// List of headers:
/// Keep-Alive - Timeout value (i.e. timeout=2, max=100)
/// Connection - The type of connection (i.e. Keep-Alive)
/// Transfer-Encoding - The type of encoding used for the transfer (i.e. chunked)
/// Content-Type - The type of Content that will be transferred (i.e. application/zip)
/// Date - The servers date and time
/// Server - The server that is handling the request (i.e. Apache)
/// AbsoluteUri - The full Uri of the resulting link that will be followed.
/// The following key will be present if the link refers to a file
/// Filename - The filename (not path) of the file that will be downloaded if the link if followed.
/// </returns>
public Dictionary<string, string> GetHTTPResponseHeaders(string Url)
{
WebRequest WebRequestObject = HttpWebRequest.Create(Url);
WebResponse ResponseObject = null;
try
{
ResponseObject = WebRequestObject.GetResponse();
}
catch(WebException ex)
{
return null;
}
// Add the header inforamtion to the resulting list
Dictionary<string, string> HeaderList = new Dictionary<string, string>();
foreach (string HeaderKey in ResponseObject.Headers)
HeaderList.Add(HeaderKey, ResponseObject.Headers[HeaderKey]);
// Add the resolved Uri to the resulting list
HeaderList.Add("AbsoluteUri", ResponseObject.ResponseUri.AbsoluteUri);
// If this is a zip file then add the download filename specified by the server to the resulting list
if (ResponseObject.ContentType.ToLower() == "application/zip")
{
HeaderList.Add("Filename", ResponseObject.ResponseUri.Segments[ResponseObject.ResponseUri.Segments.Length-1]);
}
// We are now finished with our response object
ResponseObject.Close();
// Return the resulting list
return HeaderList;
}

Uri.IsFile performs a static check upon the URI, i.e. it sees whether the 'scheme' part (the first bit including the colon) is file:. It does not look at the actual content returned by requesting the resource that resides at the URI. (In fact, because it does not actually attempt to contact the server at all, the URI could actually point to a missing resource and yet IsFile would still work.)
If you wish to see if the content of the resource is of a particular type then you will have to either:
Retrieve the HTTP header for the resource (if it is an HTTP or HTTPs resource: that is, if the 'scheme' is http or https).
Retrieve (at least part of) the resource and examine it.
You are currently doing 2, but for an HTTP resource (with an HTTP URL) then it would be cleaner and cheaper to do 1. You can do this by performing an HTTP HEAD request (as opposed to GET or POST, &c.). This will return the HTTP headers without returning the resource itself. The code would look something like:
var request = WebRequest.Create("http://somewhere.overtherainbow.com/?a=b");
request.Method = "HEAD";
WebResponse response = request.GetResponse();
//TODO check status code
string contentType = response.ContentType;
response.Close();
The content type will give you some indication of the file type, but many binary files will just be returned as an octet stream, so you may still need to retrieve and examine the magic bytes of the resource itself if you wish to differentiate between different binary file types. (The content type should be sufficient for you to differentiate between a binary file and a web page though.)
So, a full solution may be:
Send a GET request for the resource.
Check the response status to make sure there was no error.
Check the content type header to see if we have a binary octet stream.
Read two bytes from the response stream to see if it the file starts 'PK'.

You absolutely cannot detect that a given URL would cause a file to be downloaded without actually sending an HTTP request to this url.
Now to the second problem. You could send an HTTP request to download the file and then inspect the Content-Disposition header which will contain the filename:
using (var client = new WebClient())
{
using (var stream = client.OpenRead("http://*.com/download.php?id=1"))
{
var disposition = client.ResponseHeaders["Content-Disposition"];
if (disposition != null)
{
var cd = new ContentDisposition(disposition);
if (!cd.Inline && !string.IsNullOrEmpty(cd.FileName))
{
using (var outputStream = File.OpenWrite(cd.FileName))
{
stream.CopyTo(outputStream);
}
}
}
else
{
// The web server didn't send a Content-Disposition response header
// so we have absolutely no means of determining the filename
// you will have to use some default value here if you want to store it
}
}
}

Related

The request body did not contain the specified number of bytes

I am calling an API from by C# Windows service. In some cases the following error is being raised.
The request body did not contain the specified number of bytes. Got 101,379, expected 102,044
In the RAW Request captured using fiddler content length as specified.
Content-Length: 102044
In the response from the API I am receiving the following message.
The request body did not contain the specified number of bytes. Got 101,379, expected 102,044
The strange thing for me is that it does not happen for each and every request, it is generated randomly and different points. Code which I am using to get the content length is specified below.
var data = Encoding.ASCII.GetBytes(requestBody); // requestBody is the JSON String
webReqeust.ContentLength = data.Length;
Is it mandatory to provide content length in REST API calls ?
Edit 1:
This is what my sample code looks like for web request
webReqeust = (HttpWebRequest)WebRequest.Create(string.Format("{0}{1}", requestURI, queryString));
webReqeust.Method = RequestMethod.ToString();
webReqeust.Headers.Add("Authorization", string.Format("{0} {1}", token_type, access_token));
webReqeust.Method = RequestMethod.ToString();
webReqeust.ContentType = "application/json";
var data = Encoding.ASCII.GetBytes(requestBody);
webReqeust.ContentLength = data.Length;
using (var streamWriter = new StreamWriter(webReqeust.GetRequestStream()))
{
streamWriter.Write(requestBody);
streamWriter.Flush();
streamWriter.Close();
}
I would suggest maybe instead try using HttpClient as done in the linked post from mjwills here. You don't have to use content length, but it sounds like that is being enforced by the API and ultimately you are trying to post too much.
Otherwise the way I see it is that something is making the request body too large. Is it serialized input data which gets encoded into a byte array? If that is what is happening then perhaps the correct length requirements are not being enforced on the data that composes the request body, and I would suggest inspecting what goes on in the composition of the request body object itself.

Why does WebClient.UploadValues overwrites my html web page?

I'm familiar with Winform and WPF, but new to web developing. One day saw WebClient.UploadValues and decided to try it.
static void Main(string[] args)
{
using (var client = new WebClient())
{
var values = new NameValueCollection();
values["thing1"] = "hello";
values["thing2"] = "world";
//A single file that contains plain html
var response = client.UploadValues("D:\\page.html", values);
var responseString = Encoding.Default.GetString(response);
Console.WriteLine(responseString);
}
Console.ReadLine();
}
After run, nothing printed, and the html file content becomes like this:
thing1=hello&thing2=world
Could anyone explain it, thanks!
The UploadValues method is intended to be used with the HTTP protocol. This means that you need to host your html on a web server and make the request like that:
var response = client.UploadValues("http://some_server/page.html", values);
In this case the method will send the values to the server by using application/x-www-form-urlencoded encoding and it will return the response from the HTTP request.
I have never used the UploadValues with a local file and the documentation doesn't seem to mention anything about it. They only mention HTTP or FTP protocols. So I suppose that this is some side effect when using it with a local file -> it simply overwrites the contents of this file with the payload that is being sent.
You are using WebClient not as it was intended.
The purpose of WebClient.UploadValues is to upload the specified name/value collection to the resource identified by the specified URI.
But it should not be some local file on your disk, but instead it should be some web-service listening for requests and issuing responces.

How to check the modify time of a remote file

I Am in need of knowing the last modification DateTime of a remote file prior to downloading the entire content. This to save up on downloading bytes I am never going to need anyway.
Currently I am using WebClient to download the file. It is not needed to keep the use of WebClient specifically. The Last-Modified key can be found within the response headers but the entire file is downloaded at that point in time.
WebClient webClient = new WebClient();
byte[] buffer = webClient.DownloadData( uri );
WebHeaderCollection webClientHeaders = webClient.ResponseHeaders;
String modified = webClientHeaders.GetKey( "Last-Modified" );
Also I am not sure if that key is always included at each file on the internet.
You can use the HTTP "HEAD" method to just get the file's headers.
...
var request = WebRequest.Create(uri);
request.Method = "HEAD";
...
Then you can extract the last modified date and check whether to download the file or not.
Just be aware that not all servers implement Last-modified properly.

System.Net.HttpWebResponse Returning System.IO.Stream.NullStream

I have a case where the HttpWebResponse.GetResponseStream() returns a System.Net.NullStream even though examination of the HttpWebResponse object reveals that its underlying m_ConnectStream is an instance of System.Net.ConnectStream and the ContentLength property matches exactly the length of the content returned from the server. I also poked around in the Watch window and found my data but can't remember where I found it, but I KNOW my response data is there, the runtime just won't let me at it!
The only thing that is different from other successful scenarios is that the HttpWebRequest verb is "HEAD". I'm implementing a highly RESTful web service and wanting to use "HEAD" to request metadata for resources.
Figured it out:
Found the following .Net Fx Source Code (in HttpWebResponse Class):
/// <devdoc>
/// <para>Gets the stream used for reading the body of the response from the
/// server.</para>
/// </devdoc>
public override Stream GetResponseStream()
{
if (Logging.On)
Logging.Enter(Logging.Web, this, "GetResponseStream", "");
CheckDisposed();
if (!CanGetResponseStream()) {
// give a blank stream in the HEAD case, which = 0 bytes of data
if (Logging.On)
Logging.Exit(Logging.Web, this, "GetResponseStream",
Stream.Null);
return Stream.Null;
}
if (Logging.On)
Logging.PrintInfo(Logging.Web,
"ContentLength=" + m_ContentLength);
if (Logging.On)
Logging.Exit(Logging.Web, this, "GetResponseStream",
m_ConnectStream);
return m_ConnectStream;
}
As you can see it explicitly returns a null stream for "HEAD" requests. "Why would it do that?" I ask.
I found this at http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html:
9.4 HEAD
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.
Wow. I took from the Richardson and Ruby RESTful Web Services book that you might be clever and respond to a "HEAD" request with a blank XHTML Form that would fully describe the structure of a resource's elements including requiredness, datatype, length etc. using all of the XHTML(5) form field attributes. After reading the HTTP spec, however, it is clear that all 'HEAD' response data has to go in the HTTP headers.
Oh well, you learn something new everyday ...

check to see if URL is a download link using webclient c#

I am reading from the history database, and for every URL read, I am downloading it and storing the data into a string. I want to be able to determine if the link is a download link, i.e. .exe or .zip for e.g. I am assuming I need to read the headers to determine this, but I don't know how to do it with WebClient. Any suggestions?
while (sqlite_datareader.Read())
{
noIndex = false;
string url = (string)sqlite_datareader["url"];
try
{
if (url.Contains("http") && (!url.Contains(".pdf")) && (!url.Contains(".jpg")) && (!url.Contains("https")) && !isInBlackList(url))
{
WebClient client = new WebClient();
client.Headers.Add("user-agent", "Only a test!");
String htmlCode = client.DownloadString(url);
}
}
}
Instead of loading the complete content behind the link, I would issue a HEAD request.
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.
Quote of http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
See these questions for C# examples
How to check if a file exists on a server using c# and the WebClient class
How to check if System.Net.WebClient.DownloadData is downloading a binary file?
You're on the right track; you'll need to examine the ResponseHeaders after a successful request:
var someType = "application/zip";
if (client.ResponseHeaders["Content-Type"].Contains(someType)) {
// this was a "download link"
}
The tricky part will be in determining what constitutes a download link since there are so many content types possible. For example, how would you decide whether XML data is a download link or not?
Try to check WebClient's ResponseHeaders collections to validate response file type.
In case, anyone has the same problem, I have used an attribute in the history places.sqlite database which came in very handy!
Places.sqlite contains a table called moz_historyvisits which contains a column visit_type. According to [1], a visit_type of 7 is a download link. Therefore, reading this value will determine if it is a download link without reading the response header or even sending out a head method.
[1] http://www.firefoxforensics.com/research/moz_historyvisits.shtml

Categories