How to retrieve a PDF using WebBrowser or WebClient in .NET? - c#

I am trying to automate the daily retrieving of a web file using .NET.
The file is a PDF located at an address similar to:
http://www.example.com/?s=doc20101022
and these are the headers of HTTP request registered for debug using IE
HTTP/1.1 200 OK
Server: Apache/2.2.3 (CentOS)
Vary: User-Agent,Accept-Encoding
Expires: 0
Cache-Control: must-revalidate, post-check=0, pre-check=0
Pragma: public
Last-Modified: Mon, 22 Nov 2010 22:45:12 GMT
Cache-Control: private
Content-Disposition: attachment; filename="doc20101022.pdf"
Content-Transfer-Encoding: binary
Content-Type: application/force-download
Date: Tue, 23 Nov 2010 10:41:43 GMT
X-Varnish: 2155914052
Via: 1.1 varnish
Content-Length: 6596997
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Age: 2
Can you please suggest me a way to get it and save it locally using WebClient, WebBrowser or other VB.NET (Framework 4.0) components?

Use the DownloadFile or DownloadFileAsync method of WebClient:
WebClient wc = new WebClient();
wc.DownloadFileCompleted += new AsyncCompletedEventHandler(delegate(object source, AsyncCompletedEventArgs args) {
// Do something when the file has been downloaded successfully.
});
wc.DownloadFileAsync(new Uri("http://www.example.com/?s=doc20101022"), #"C:\Yourfile.pdf");
Edit: You tagged the question with c# and only mentioned .NET in the subject so I've provided you with a C# solution. If you need it in VB.NET it should be easy to port though.

Related

How to use PUT to upload a large file using c# and raw data (no multipart-formdata)

I'm working with the OpenStack Swift API version 1, and am trying to upload a large file to their object storage using c# and DotNet 4.5 in Visual Studio 2015.
See their API documentation at http://developer.openstack.org/api-ref-objectstorage-v1.html in the Objects section under Create or replace object.
Following their example using curl, I was able to upload a small test file. Wireshark shows a very simple protocol with the PUT method, a couple of headers and the raw data:
curl -X PUT -H "X-Auth-Token: ab2716160b394f6aab337f1ea8e9378f" -H "Content-Length: 10" -H "Content-Type: application/octet-stream" -d "1234567890" http://10.25.10.10:8080/v1/AUTH_70466e9f789744e8b0169d398b8492cd/Test/test.dat
PUT /v1/AUTH_70466e9f789744e8b0169d398b8492cd/Test/test.dat HTTP/1.1
User-Agent: curl/7.30.0
Host: 10.25.10.10:8080
Accept: */*
X-Auth-Token: ab2716160b394f6aab337f1ea8e9378f
Content-Length: 10
Content-Type: application/octet-stream
1234567890
HTTP/1.1 201 Created
Last-Modified: Tue, 27 Oct 2015 23:52:45 GMT
Content-Length: 0
Etag: e807f1fcf82d132f9bb018ca6738a19f
Content-Type: text/html; charset=UTF-8
X-Trans-Id: txfb6b1300b3834e178f0ae-0056300e4c
Date: Tue, 27 Oct 2015 23:52:44 GMT
The DotNet WebClient apparently supports only POST when uploading files via its UploadFileAsync method, so that's out. I've found many other upload examples, all using multipart-formdata and POST.
How do I create a PUT request that includes a large file's contents without having to read the file into a buffer?
I appreciate any tips or pointers to examples or documentation!
Turns out that WebClient has a method UploadFileTaskAsync that supports a method parameter which I can set to PUT.

Provide AntiForgery Token with System.Net.Http.HttpClient and MVC

I have a WPF (could be any winform I guess) app that tries to login to a standard MVC 5 website using a HttpClient.
Normally I can login successfully with a call to PostAsync() where I provide the UserName and Password params in a HttpContent!
However, when I add the [ValidateAntiForgeryToken] to my controller's Login (POST) action, the PostAsync() call fails with Internal Server Error.
I have tried collecting the "__RequestVerificationToken" from a simple GET request and sending it with my POST request by adding it to the POST params, the Header of the request or the HttpHandler's CookieContainer (or any combination of the three) but still I get error 500 from the server.
I know it can be done with HttpWebRequests (apparently) but I don't know what I'm missing when using a HttpClient. I also don't know what exactly went wrong on the server side.. or how to check that since the code never reaches my controller method.
Did someone else try this by any chance?
EDIT 1:
I'm adding the raw data sent by the browser for both GET and POST:
GET http://localhost:57457/Account/Login HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Referer: http://localhost:57457/Account/Login
Accept-Language: en-US
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
DNT: 1
Host: localhost:57457
Cookie: NavigationTreeViewState=%5b%7b%27N0_1%27%3a%27T%27%2c%27N0%27%3a%27T%27%7d%2c%27N0_1_2%27%2c%7b%7d%5d; style=default; __RequestVerificationToken=Bak42Ga5sHJitYlmut6OgvmqXNmP7kKQRNaMSsLMAUh86iHGGmz5pnNfz_soKu46Wax9sG23arPOTnSh1bvaWyWqQ9NH4GJxFmendW8VFTg1
RESPONSE:
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/8.0
X-AspNetMvc-Version: 5.2
X-Frame-Options: SAMEORIGIN
X-AspNet-Version: 4.0.30319
X-SourceFiles: =?UTF-8?B?RDpcRUJTLkNvZGVcUHJvamVjdHNcQ1ZSUE9TX1dlYlNpdGVcQ1ZSUE9TX1dlYlNpdGVcQWNjb3VudFxMb2dpbg==?=
X-Powered-By: ASP.NET
Date: Thu, 04 Dec 2014 10:00:00 GMT
Content-Length: 1734
[View page content]
POST http://localhost:57457/Account/Login HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Referer: http://localhost:57457/Account/Login
Accept-Language: en-US
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Content-Length: 180
DNT: 1
Host: localhost:57457
Pragma: no-cache
Cookie: NavigationTreeViewState=%5b%7b%27N0_1%27%3a%27T%27%2c%27N0%27%3a%27T%27%7d%2c%27N0_1_2%27%2c%7b%7d%5d; style=default; __RequestVerificationToken=Bak42Ga5sHJitYlmut6OgvmqXNmP7kKQRNaMSsLMAUh86iHGGmz5pnNfz_soKu46Wax9sG23arPOTnSh1bvaWyWqQ9NH4GJxFmendW8VFTg1
__RequestVerificationToken=Bak42Ga5sHJitYlmut6OgvmqXNmP7kKQRNaMSsLMAUh86iHGGmz5pnNfz_soKu46Wax9sG23arPOTnSh1bvaWyWqQ9NH4GJxFmendW8VFTg1&UserName=test&Password=test
RESPONSE:
HTTP/1.1 400 Bad request (user/password for testing purposes only)
Cache-Control: private
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/8.0
X-AspNetMvc-Version: 5.2
X-Frame-Options: SAMEORIGIN
X-AspNet-Version: 4.0.30319
X-SourceFiles: =?UTF-8?B?RDpcRUJTLkNvZGVcUHJvamVjdHNcQ1ZSUE9TX1dlYlNpdGVcQ1ZSUE9TX1dlYlNpdGVcQWNjb3VudFxMb2dpbg==?=
X-Powered-By: ASP.NET
Date: Thu, 04 Dec 2014 10:00:00 GMT
Content-Length: 4434
[View page content]
EDIT 2:
This is what my app sends for GET and POST:
GET http://localhost:57457/Account/Login HTTP/1.1
Host: localhost:57457
Connection: Keep-Alive
POST http://localhost:57457/Account/Login HTTP/1.1
Content-Type: application/x-www-form-urlencoded
Host: localhost:57457
Cookie: __RequestVerificationToken=df9nBSP_J1IiLrv84RwrkmvbYBrnH4iqv97wRvz6HMPLWBhgI4XzGeAFcschovHwD8mTtHU6xrmVxz1Ku96_BaoB79le_vLTcrgGemU4gjc1
Content-Length: 163
Expect: 100-continue
__RequestVerificationToken=df9nBSP_J1IiLrv84RwrkmvbYBrnH4iqv97wRvz6HMPLWBhgI4XzGeAFcschovHwD8mTtHU6xrmVxz1Ku96_BaoB79le_vLTcrgGemU4gjc1&UserName=test&Password=test
And finally this is the error:
[HttpAntiForgeryException (0x80004005): Validation of the provided anti-forgery token failed. The cookie "__RequestVerificationToken" and the form field "__RequestVerificationToken" were swapped.]
Thanks!
You d probably need to include aspnet session id cookie with your requests
EDIT:
OK ur right, it is not the session id, but you need two token to send back to your post action.
I think what you re doing wrong is using same value for both tokens, but they should be different, altho name of both tokens is __RequestVerificationToken.
Token grabbed from cookie should be send back as cookie and token grabbed from form field goes back as form field.
It's because you're missing the anti-forgery token from HtmlHelper.AntiForgeryToken() in your POST from your application.
You'll need to load a page from your WPF application with HtmlHelper.AntiForgeryToken() on the view. Then take the value of the hidden input element with the name __RequestVerificationToken and attach it to your login POST request to the server.

How to serve SVG using C# so that it will render as a CSS background image

I have a server side script that generates an SVG output. I'm using MVC3 and in the RenderSVG method I return the content of the SVG like so:
return Content(svgContent, "text/xml; charset=utf-8");
In this case svgContent is simply the contents of an SVG file. If I navigate to my RenderSVG method directly, it renders my image as expected. However if I set the same URL as the background-image property of a CSS tag is doesn't render. I am using the latest version of Chrome and can confirm that normal SVG files render normally as a background image. But this server side version does not.
Any ideas what I'm doing wrong? Here are my response headers for the original image and the server side version. The content of the image is identical in each case.
Original response headers
HTTP/1.1 304 Not Modified
Last-Modified: Mon, 23 Sep 2013 04:56:16 GMT
Accept-Ranges: bytes
ETag: "018f63b19b8ce1:0"
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 23 Jan 2014 11:38:38 GMT
Scripted response headers
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/xml; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/7.5
X-AspNetMvc-Version: 4.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Date: Thu, 23 Jan 2014 11:39:29 GMT
Content-Length: 490
try using "image/svg+xml" as mime type for the svg.
return Content(svgContent, "image/svg+xml; charset=utf-8");
( see http://www.w3.org/TR/SVG11/mimereg.html )

Basic Authentication over redirection

I have a webservice that needs a Basic Authentication header. However, when I call it using
var header = "Authorization: Basic " +
CreateBasicHttpAuthenticationHeader(login, password);
webRequest.Headers.Add(header);
var webResponse = (HttpWebResponse)webRequest.GetResponse();
It returns a 303 - See Other:
POST https://myservice/rates HTTP/1.1
Authorization: Basic QXZ...NjY=
Content-Type: application/x-content
X-API-Version: 1.1
If-Unmodified-Since: Mon, 23 Sep 2013 08:32:27 GMT
User-Agent: UserAgent
Response:
HTTP/1.1 303 See Other
Date: Mon, 23 Sep 2013 08:30:57 GMT
X-Opaque-ID: q8nxxxxc
Location: https://myservice/rates
Content-Length: 0
.Net then automatically sends a GET request to the new location:
GET https://myservice/rates HTTP/1.1
Content-Type: application/x-content
X-API-Version: 1.1
If-Unmodified-Since: Mon, 23 Sep 2013 08:32:27 GMT
User-Agent: UserAgent
But does not send the Authorization header this time. Do you know a way to tell it to send all headers, on all calls? Should I tell it not to follow the content?
You have to deactivate the AllowRedirect property of your HttpRequest.
Then you have to build you own redirection system with basic auth headers.
This is not wonderful, but otherwise the .Net framework drops your header when redirecting.

Windows WebBrowser problem with Ajax code in a page - c#

I'm downloading a site for its content using a Webcrawler I wrote with Microsoft WebBrowser.
A part of the site's content is sent only after some kind of verification sent from the client side - my guess is that its cookies / session cookies.
When i'm trying to download the page from my crawler i see (with Fiddler's help) that the inner link for the ajax sends 'false' for one of the parameters and the data is not received.
When I try to perform the same action from any browser, Fiddler shows that the property is sent as '1'.
After a day of testing, any lead will be grateful - Is there a way to manipulate this property? plant cookies? any other idea?
Following khunj answer, I'm adding Headers from IE and from my WebBrowser:
In both headers i removed fields which have the same value
From IE:
GET /feed/prematch/1-1-234562-8527419630-1-2.dat HTTP/1.1
x-requested-with: XMLHttpRequest
Referer: http://www.mySite.com/ref=12345
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0)
Connection: Keep-Alive
Cookie: __utma=1.1088924975.1299439925.1299976891.1300010848.14;
__utmz=1.1299439925.1.1.utmcsr=(direct)|utmccn=
(direct)|__utmb=2.1.10.1300010848; __utmc=136771054; user_cookie=63814658;
user_hash=58b923a5a234ecb78b7cc8806a0371c5; user_time=1297166428; infobox_8=1;
user_login_id=12345; mySite=5e1c0u8g6qh41o2798ua2bfbi3
HTTP/1.1 200 OK
Date: Sun, 13 Mar 2011 10:07:38 GMT
Server: Apache
Last-Modified: Sun, 13 Mar 2011 10:07:25 GMT
ETag: "26a6d9-19df-49e5a5c9ed140"
Accept-Ranges: bytes
Content-Length: 6623
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain
Content-Encoding: gzip
From WebBrowser:
GET /feed/prematch/1-1-234562-8527419630-false-2.dat HTTP/1.1
x-requested-with: XMLHttpRequest
Referer: http://www.mySite.com/ref=12345
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0)
Connection: Keep-Alive
Cookie: __utma=1.1782626598.1299416994.1299974912.1300011023.129;
__utmb=2.1.10.1300011023; __utmz=1.1299416994.1.1.utmcsr=
(direct)|utmccn=(direct)|__utmc=136771054; user_cookie=65192487;
user_hash=6425034050442671103fdd614e4a2932; user_time=1299416986;
user_full_time_zone=37;user_login_id=12345; mySite=q9qlqqm9bunm9siho32tdqdjo0
HTTP/1.1 404 Not Found
Date: Sun, 13 Mar 2011 10:10:33 GMT
Server: Apache
Content-Length: 313
Connection: close
Content-Type: text/html; charset=iso-8859-1
Thanks in advance,
Oz.
Well, the server is obviously treating your request from your crawler differently. Since you already have fiddler involved, what is different in your request headers when you make the request from IE versus using your crawler. The reason I say IE is because the webbrowser control uses the same engine as IE for doing its work.
The way I solved my problem is by using Fiddler as a proxy and defining a custom reply to the server that whenever the PathAndQuery property contains the site address, replace the 'false' to '1'.
Not the most elegant solution but fits my problem.
I learned the most from these 2 pages:
FiddlerScript CookBook
A site which teaches on the specific customRules.js file and the field i needed to edit
Thanks for the help,
Oz.

Categories