How do I download a pdf from a Javascript-generated POST request? - c#

From what I can tell, the site requires a button click, which runs some Javascript and then sends a POST request. The POST request returns a pdf. All the solutions I've found for downloading a file either use WebClient (but I don't have a URL for the pdf) or HTTPRequest (which can't invoke a click).
I can get to the point of invoking the click with WebBrowser, and I can see using Fiddler that the pdf is getting returned in the site's response, but I have no idea how to get it onto my machine.

I was able to solve this using Selenium, with the following config settings passed into the Firefox driver:
profile = Firefox.FirefoxProfile()
profile.SetPreference("browser.download.dir", saveDir)
profile.SetPreference("browser.download.folderList", 2)
profile.SetPreference("browser.helperApps.neverAsk.saveToDisk","application/pdf")
profile.SetPreference("pdfjs.disabled",True)
profile.SetPreference("browser.tabs.remote.autostart", False)
profile.SetPreference("browser.tabs.remote.autostart.1", False)
profile.SetPreference("browser.tabs.remote.autostart.2", False)
profile.SetPreference("browser.tabs.remote.force-enable", "false")
driver = Firefox.FirefoxDriver(profile)
Where saveDir is the target download directory. The first half of these config settings are to make Firefox download without a prompt, and the value for browser.helperApps.neverAsk.saveToDisk is a MIME type. The second half of the configs prevent Firefox from crashing when driver.Quit() is called.

Related

Accessing downloaded file instead of page HTML

I have some code that connects to an HTTP API, and is supposed to get an XML response. When the API link is placed in a browser, the browser downloads the XML as a file. However, when the code connects to the same API, HTML is returned. I've told the API owner but they don't think anything is wrong. Is there a way to capture the downloaded file instead of the HTML?
I've tried setting the headers to make my code look like a browser. Also tried using WebRequest instead of WebClient. But nothing works.
Here is the code, the URL works in the browser (file downloaded) but doesn't work for WebClient:
WebClient webClient = new WebClient();
string result = webClient.DownloadString(url);
The code should somehow get the XML file instead of the page HTML (actually the HTML doesn't appear in the browser, only the file).
The uri that you access may be a HTML page that have its own mechanism (like generating the actual download address which may be dynamically generated by the server and redirecting to it, in order to prevent external linking access) to access the real file.
It is supposed to use a browser background browser core like CefSharp to run the HTML and its javascript to let it navigate, and probably you may want to hook the download event to handle the downloading.
I think you need to add accept header to the WebClient object.
using (var client = new WebClient())
{
client.Headers[HttpRequestHeader.Accept] = "application/xml;q=1";
string result = webClient.DownloadString(url);
}
Thank you all for your input. In the end, it was caused by our vendor switching to TLS 1.2. I just had to force my code to use 1.2 and then it worked.

Unable to download file (page url same as download url)

I'm trying to download a zip file (that is normally accessed/downloaded by pressing a button on a web page) using C#.
Normally the file is downloaded by selecting "Data Export" and then clicking the "SEARCH" button at this URL:
http://insynsok.fi.se/SearchPage.aspx?reporttype=0&culture=en-GB&fromdate=2016-05-30&tomdate=2016-06-03
If trigger the download manually on the webpage and then copy the download url from the 'Downloads' view of chrome or firefox I get the exact same URL as above. When I paste that in a browser window I will not trigger the download, instead the above page will be loaded and I have to trigger the download manually in the same way as in the first place.
I've also tried using the network tab of the inspector to copy the request header of the request that is triggered when clicking the "SEARCH" button, but that URL is also the same as the one above.
Trying with C# I get the same result, the page itself is downloaded. My code looks as follows:
using (var client = new WebClient())
{
client.DownloadFile("http://insynsok.fi.se/SearchPage.aspx?reporttype=0&culture=sv-SE&fromdate=2016-05-30&tomdate=2016-06-03", "zipfile.zip");
}
My guess is that my code is correct, but how do I get the correct URL to be able to download the file directly?
ASP.net inserts a bunch of crap into the the page to make things like this particularly hard. (Validation tokens, form tokens, etc).
Your best bet is to use a python library called Mechanize, or if you want to stick to C# you can use Selenium or C# WebBrowser. This will fully automate visiting the page (you can render the C# WebBrowser invisible), then just click the button to trigger the download programatically.

Redirect to .pdf

On browsing local.com/test/document, I want to redirect to local.com/document/demo.pdf
It is getting redirected properly. i.e demo.pdf file is getting downloaded. But browser Url is not getting changed.
I am using following lines of code for redirection:
HttpContext.Current.Response.Redirect("local.com/document/demo.pdf", false);
context.ApplicationInstance.CompleteRequest();
Is this the correct behavior as we are redirecting to file? or is there a way where file can be downloaded and also browser url gets changed?
It is not possible using Response.Redirect(),because the it's a function and can do one task at a time.
You can use
<a href="http://xyz.pdf" **target="_self"**>Click Me</a>
Change the target _blank,it will redirect to a new tab.
You can also go with
<iframe src="xyz.pdf"></iframe>

Visual Studio deleting source code when running Web Project

I am upgrading a Web project from Windows XP / Visual Studio 2010 to Windows 8.1 and Visual Studio 2013. When I do this I get a migration report showing two warnings and 15 other messages, none of which appear to be of any consequence. I then adjust the target framework for the web project to 4.5.1 and run the project.
This displays the web page as I expect, but any interaction with it (selecting a new item on a pull-down, for instance) results in the error:-
HTTP Error 405.0 - Method Not Allowed
The page you are looking for
cannot be displayed because an invalid method (HTTP verb) is being
used.
Attempting to discover the reason for this, I find all the source code (.aspx files, .cs files, .config files and .css files) are all missing. Fortunately I can recover them from the backups that the migration process made, but this is still rather alarming. Can anyone tell me how to prevent this? What project setting might be responsible?
Edit I have tried copying the code back into the project directory after displaying the web page for the first time. Selecting a new item on the pull-down then works, but deletes the source code again. So the HTTP error appears to be a consequence of the page being actually missing during the post-back.
The page you are looking for cannot be displayed because an invalid
method (HTTP verb) is being used.
Cause 1
This problem occurs because the client makes an HTTP request by using an HTTP method that does not comply with the HTTP specifications.
Resolution :
Make sure that the client sends a request that contains a valid HTTP method. To do this, follow these steps:
Click Start, type Notepad in the Start Search box, right-click Notepad, and then click Run as administrator.
Note If you are prompted for an administrator password or for a confirmation, type the password, or provide confirmation.
On the File menu, click Open. In the File name box, type %windir%\system32\inetsrv\config\applicationhost.config, and then click Open.
In the ApplicationHost.config file, locate the tag.
Make sure that all the handlers use valid HTTP methods.
Save the ApplicationHost.config file.
Cause 2 :
This problem occurs because a client makes an HTTP request by sending the POST method to a page that is configured to be handled by the StaticFile handler. For example, a client sends the POST method to a static HTML page. However, pages that are configured for the StaticFile handler do not support the POST method.
Resolution :
Send the POST request to a page that is configured to be handled by a handler other than the StaticFile handler (for example, the ASPClassic handler). Or, change the request that is being handled by the StaticFile handler so that it is a GET request instead of a POST request.
Reference
The program when it runs uses old scratch files; when it starts up it deletes them; it determines these by deleting all files of the form:-
HttpContext.Current.User.Identity.Name + "*.*";
which at one time deleted all the files intended. On the new Windows 8 machine, the HttpContext.Current.User.Identity.Name resolves to the null string, with the inevitable consequences...
Please excuse me while I curl up with embarrassment.

Download office document without the web server trying to render it

I'm trying to download an InfoPath template that's hosted on SharePoint. If I hit the url in internet explorer it asks me where to save it and I get the correct file on my disk. If I try to do this programmatically with WebClient or HttpWebRequest then I get HTML back instead.
How can I make my request so that the web server returns the actual xsn file and doesn't try to render it in html. If internet explorer can do this then it's logical to think that I can too.
I've tried setting the Accept property of the request to application/x-microsoft-InfoPathFormTemplate but that hasn't helped. It was a shot in the dark.
I'd suggest using Fiddler or WireShark, to see exactly how IE is sending the request, then duplicating that.
Have you tried spoofing Internet Explorer's User-Agent?
There is a HTTP response header that makes a HTTP user agent download a file instead of trying to display it:
Content-Disposition: attachment; filename=paper.doc
I understand that you may not have access to the server, but this is one straight-forward way to do this if you can access the server scripts.
See the HTTP/1.1 specification and/or say, Google, for more details on the header.
This is vb.net, but you should get the point. I've done this with an .aspx page that you pass the filename into, then return the content type of the file and add a header to make it an attachment, which prompts the browser to treat it as such.
Response.AddHeader("Content-Disposition", "attachment;filename=filename.xsn")
Response.ContentType = "application/x-microsoft-InfoPathFormTemplate"
Response.WriteFile(FilePath) ''//Where FilePath is... the path to your file ;)
Response.Flush()
Response.End()

Categories