Opening IE, navigate to URL and get source code in C# - c#

I need to open up Internet Explorer with an URL and then read the source code of the document in C#.
Is this possible?
I know you can start processes but how can i navigate to an URL and get the source code?
I have to open it via IE, because the protocol im using to retrieve the page only works in IE.
Thanks!

Following can get you html from a url without opening IE.
using(WebClient client = new WebClient()) {
string html = client.DownloadString(address);
}
To open IE for a particular URL you can do :
System.Diagnostics.Process.Start("iexplore", "http://example.com");

Depending on your requirements there are different techniques:
Process.Start("iexplore.exe", "http://www.google.com"); to run IE and then a WebCilent.DownloadString to download the HTML source (2 HTTP requests sent to the server)
Use the WebBrowser control which allows you to embed IE in a desktop application. It also allows you to retrieve the HTML source code of the webpage to which it navigated.

Related

Accessing downloaded file instead of page HTML

I have some code that connects to an HTTP API, and is supposed to get an XML response. When the API link is placed in a browser, the browser downloads the XML as a file. However, when the code connects to the same API, HTML is returned. I've told the API owner but they don't think anything is wrong. Is there a way to capture the downloaded file instead of the HTML?
I've tried setting the headers to make my code look like a browser. Also tried using WebRequest instead of WebClient. But nothing works.
Here is the code, the URL works in the browser (file downloaded) but doesn't work for WebClient:
WebClient webClient = new WebClient();
string result = webClient.DownloadString(url);
The code should somehow get the XML file instead of the page HTML (actually the HTML doesn't appear in the browser, only the file).
The uri that you access may be a HTML page that have its own mechanism (like generating the actual download address which may be dynamically generated by the server and redirecting to it, in order to prevent external linking access) to access the real file.
It is supposed to use a browser background browser core like CefSharp to run the HTML and its javascript to let it navigate, and probably you may want to hook the download event to handle the downloading.
I think you need to add accept header to the WebClient object.
using (var client = new WebClient())
{
client.Headers[HttpRequestHeader.Accept] = "application/xml;q=1";
string result = webClient.DownloadString(url);
}
Thank you all for your input. In the end, it was caused by our vendor switching to TLS 1.2. I just had to force my code to use 1.2 and then it worked.

Download from url that generates a file

I need to be able to download a file from url. The url does not link to an actual file instead it generates the file on the server first and then gives a download dialog. Probably returning an mvc FileResult.
I'm just interested in getting the byte[] from the file.
I've tried:
using (var webClient = new WebClient())
{
System.Uri uri = new System.Uri(Document.Url);
bytes = await webClient.DownloadDataTaskAsync(uri);
}
This works but I get a corrupted file as expected.
I do not have control over how the server generates or serves the file.
Any way to wait for the file to complete generating and then get the file content?
TIA
Never mind. Turns out the link returns some javascript that auto authenticates a user and then does a jquery get to another url and port to generate the file. So I was basically downloading that script and saving it to pdf. doh.
So a work around would be to mimic that in some way.

Unable to download file (page url same as download url)

I'm trying to download a zip file (that is normally accessed/downloaded by pressing a button on a web page) using C#.
Normally the file is downloaded by selecting "Data Export" and then clicking the "SEARCH" button at this URL:
http://insynsok.fi.se/SearchPage.aspx?reporttype=0&culture=en-GB&fromdate=2016-05-30&tomdate=2016-06-03
If trigger the download manually on the webpage and then copy the download url from the 'Downloads' view of chrome or firefox I get the exact same URL as above. When I paste that in a browser window I will not trigger the download, instead the above page will be loaded and I have to trigger the download manually in the same way as in the first place.
I've also tried using the network tab of the inspector to copy the request header of the request that is triggered when clicking the "SEARCH" button, but that URL is also the same as the one above.
Trying with C# I get the same result, the page itself is downloaded. My code looks as follows:
using (var client = new WebClient())
{
client.DownloadFile("http://insynsok.fi.se/SearchPage.aspx?reporttype=0&culture=sv-SE&fromdate=2016-05-30&tomdate=2016-06-03", "zipfile.zip");
}
My guess is that my code is correct, but how do I get the correct URL to be able to download the file directly?
ASP.net inserts a bunch of crap into the the page to make things like this particularly hard. (Validation tokens, form tokens, etc).
Your best bet is to use a python library called Mechanize, or if you want to stick to C# you can use Selenium or C# WebBrowser. This will fully automate visiting the page (you can render the C# WebBrowser invisible), then just click the button to trigger the download programatically.

C# wpf webbrowser control - download file

I'm using the webbrowser control to scrape my medical information from my Health Care provider,
The website is secured using a username and password, I've managed to scrape everything I need except some pdf file.
After navigating to the page I get this javascript "Loading...", In a regular browser I'll see the PDF file rendered in the browser, but for the webbrowser control it doesn't display the pdf, I get the famous yellow notification bar.
The url for the pdf file is like this
"https://www.***.com/phoenix/views/akgCharts/zoomAkgChart.jsp?&date=20130502&time=123000",
I'm using mshtml to do all the scraping, I don'tfind the file in the mshtml object, using fidller 2.0 I can see that the pdf file is downloaded to the computer (somewhere in the memory, I didn't find it in any folder)
Any idea??
if you know the url what is sends you the file you can try something like this:
System.Net.WebClient _wclient = new System.Net.WebClient();
_wclient.DownloadFile("https://www.***.com/phoenix/views/akgCharts/zoomAkgChart.jsp?&date=20130502&time=123000",", #"c:\MedicalReport_" + DateTime.Now + ".pdf");

Use HTML string from Server Requets, and create the web page without saving it a file [in C#]

I´m sending the value of a variable via POST to a PHP page in C#. I get the data stream from the server that has all the web page in HTML with the value of the POST. This information is stored in a string variable.
I would like to open a browser and show the web page (maybe using System.Diagnostics.Process.Start("URL")), without having to save it in a file, this is showing the page in the moment and, when the browser is closed, no file is stored in the server.
Any idea?
Drop a WebBrowser control into a new form webBrowser1 and set its DocumentTextProperty to your result html
webBrowser1.DocumentText = ("<html><body>hello world</body></html>");
source:
<html><body>hello world</body></html>
You aren't going to be able to do that in an agnostic way.
If you simply wanted to open the URL in a browser, then using the Process class would work.
Unfortunately, in your case, you already have the content from creating the POST to the server, and you really want to stream that response in your application to the browser.
It's possible among the some browsers, but it's not able to be done in an agnostic way (and it's complicated even when targeting a specific browser).
To complicate matters, you want the browser to believe that the stream you are sending it is really coming from the server, when in reality, it's not.
I believe that your best bet would be to save the response to the file system in a temp file. However, before you do, add the <base> tag to the file with the URL that the file came from. This way, relative URLs will resolve correctly when rendered in the browser.
Then, just open the temporary file in the browser using the Process class.

Categories