I have a website (Bank WebSite) i using WatIn to Login and getting to page with links(with pdf files), each link open a Page with opened pdf.file,on that page i have only the opened pdf file and button to download this file(no need to click on it because page automaticlly popUp message with save\saveAs)
I tried:
1- string page=browser.body.OuterHtml
Not working i cant see the Iframe,i cant find it too.
2-int response = URLDownloadToFile(0, Link, FullFilePath, 0, 0);
Not working a gettin login page it because i need cookies
3- WebClient myWebClient = new WebClient();
myWebClient.DownloadFile(myStringWebResource,fileName);
Gives me the same result.
I CAN'T GET COOKIES FROM WatIn Browser and SET IT IN WebClient
CookieCollection cookies = _browser.GetCookiesForUrl(new Uri(url));
string cookies= ie.Eval("document.cookie");
returns my only 1 parameter
sow please do not say to me that i just need to get cokies from WatIn and set it in myWebClient.
Sow any ideas how can i save this pdf file?
One option would be using iTextSharp library, which would give a list helpful methods to download the PDF. Sample code is below...
Uri uri = new Uri("browser url");
PdfReader reader = new PdfReader(uri);
Related
I am developing an application which is showing web pages through a web browser control.
When I click the save button, the web page with images should be stored in local storage. It should be save in .html format.
I have the following code:
WebRequest request = WebRequest.Create(txtURL.Text);
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
html = sr.ReadToEnd();
}
Now string html contains the webpage content. I need to save this into D:\Cache\
How do i save the html contents to disk?
You can use this code to write your HTML string to a file:
var path= #"D:\Cache\myfile.html";
File.WriteAllText(path, html);
Further refinement: Extract the filename from your (textual) URL.
Update:
See Get file name from URI string in C# for details. The idea is:
var uri = new Uri(txtUrl.Text);
var filename = uri.IsFile
? System.IO.Path.GetFileName(uri.LocalPath)
: "unknown-file.html";
you have to write below code on save button
File.WriteAllText(path, browser.Document.Body.Parent.OuterHtml, Encoding.GetEncoding(browser.Document.Encoding));
Now the 'Body.parent' must save whole the page instead of just saving only part.
check it.
There is nothing built-in to the .NET Framework as far I know.
So my approach would be like below:
Use System.NET.HttpWebRequest to get the main HTML document as a
string or stream (easy). (Which you have done already)
Load this into a HTMLAgilityPack document where you can now easily
query the document to get lists of all image elements, stylesheet
links, etc.
Then make a separate web request for each of these files and save
them to a subdirectory.
Finally update all relevent links in the main page to point to the
items in the subdirectory.
I been looking all over for this answer but can't find it anywhere..
This is what I want to be able to do:
I have a form application where i have a button that says "collect html code". When I press this button I want C# to download the HTML source code from the website I'm currently on (using IE). I've been using this code:
WebClient web = new WebClient();
string html = web.DownloadString("www.example.com");
But now I don't want to specify the URL in my code! And I don't want to use a webbrowser in my application.
Anyone got a solution?
Thanks!
With this code you can get IE7 and later version URL in opened tabs :
SHDocVw.ShellWindows allBrowsers = new SHDocVw.ShellWindows();
foreach (SHDocVw.InternetExplorer ieInst in allBrowsers )
{
String url = ieInst.LocationURL;
// do your stuff
}
So you can access the urls and do your stuff with WebClient class.
You need to add a reference to a COM component called Microsoft Internet Controls
You are talking about getting URLs from IE window? If so, here you are:
var urls = (new SHDocVw.ShellWindows()).Cast<SHDocVw.InternetExplorer>().
Select(x => x.LocationURL).ToArray();
Don't forget to add COM reference "Microsoft Internet Controls" in your project.
I need to download a pdf files,but when i click on links my browser open it,i dont have download window( save\saveAs\Open).
I am using WatIn making login\password than i press on links, i can't use Webreqest to get this files because i need to set cookies, and i can't get cookies from WatIn brower(in this case).
My code
using (var browser = new IE("https://www.test.com")){
browser.GoTo(Link);
int response = URLDownloadToFile(0, Link, FilePath, 0, 0);
}
In link that open download windowd( save\saveAs\Open) all work,but here my brower just open the file in brower,and i can't save It.
How can i save PDF file with URLDownloadToFile
You could use a WebClient, this is the simplest way I can think of:
using (var webClient = new WebClient())
{
webClient.DownloadFile("https://www.test.com", "C:\test.pdf");
}
You can also add a Proxy and Network Credentials if you need them.
EDIT: About the cookie stuff, you can also add those to the WebClient
I have on my hard disk an html file i saved the content from a website with Webclient.
private void DownloadHtml()
{
using (var client = new WebClient())
{
client.DownloadFile(webSite, OriginalHtmlFilePath);
}
}
Now after did some changes with the file content changed only some texts no tags or any scripts i want to load back the html file. So i did:
string html = File.ReadAllText(ScrambledHtmlFilePath);
Uri Uri = new Uri(ScrambledHtmlFilePath);
//webBrowser1.DocumentText = html;
webBrowser1.Navigate(Uri);
In both cases using the html or the Uri its loading the html as local file and therefore im getting some scripts errors.
If i open the file from my hard disk with Chrome or IE it's loading the file online like i surfed to the site im not getting any script errors.
The problem is that when im using Chrome or IE its taking like 10-15 seconds untill its loading the file.
How can i load the html file in WebBrowser fast and to be online like if i was openning it with IE or Chrome ?
You can set the DocumentText property of the WebBrowser control to the edited-HTML content.
I have to download and parse a website which is rendered by ASP.NET. If I use the code below I only get half of the page without the rendered "content" that I need. I would like to get the full content that I can see with Firebug or the IE Developer Tool.
How can I do this. I didn#t find a solution.
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse response = (HttpWebResponse)req.GetResponse();
StreamReader streamReader = new StreamReader(response.GetResponseStream());
string code = streamReader.ReadToEnd();
Thank you!
UPDATE
I tried the webcontrol solution. But it didn't work. I have in a WPF Project and use the following code and don't even get the content of a website. I don't see my mistake right now :( .
System.Windows.Forms.WebBrowser webBrowser = new System.Windows.Forms.WebBrowser();
Uri uri = new Uri(myAdress);
webBrowser.AllowNavigation = true;
webBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
webBrowser.Navigate(uri);
private void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
System.Windows.Forms.WebBrowser wb = sender as System.Windows.Forms.WebBrowser;
string tmp = wb.DocumentText;
}
UPDATE 2
That's the code I came up with in the meantime.
However I don't get any output. My elementCollection doesn't return any values.
If I can get the html source as a string I'd be happy and parse it with the HtmlAgilityPack.
(I don't want to incoporate the browser into my XMAL code)
Sorry for getting on your nerves!
Thank you!
WebBrowser wb = new WebBrowser();
wb.Source = new Uri(MyURL);
HTMLDocument doc = (HTMLDocument)wb.Document;
IHTMLElementCollection elementCollection = doc.getElementsByName("body");
foreach (IHTMLElementCollection element in elementCollection)
{
tb.Text = element.toString();
}
If the page you're referring to has IFrames or other dynamic loading mechanisms, the use of HTTPWebRequest would'nt be enough. a better solution would be (if possible) to use a WebBrowser control
The answer might be that the content of the web site is rendered with JavaScript - probably with some AJAX calls that fetch additional data from the server to build the content. Firebug and IE Developer Tool will show you the rendered html code, but if you choose 'view source', you should see the same same html as the one that you fetch with the code.
I would use a tool like the Fiddler Web Debugger to monitor what the page downloads when it is rendered. You might be able to get the needed content by simulating the AJAX requests that the page makes.
Note that it can be a b*tch to simulate browsing ASP.NET web site if the navigation has been made with post backs, because you will need to include the value of all the form elements (including the hidden view state) when simulation clicks on links.
Probably not an answer, but you might use the WebClient class to simplify your code:
WebClient client = new WebClient();
string html = client.DownloadString(URL);
Your code should be downloading the entire page. However, the page may, through JavaScript, add content after it's been loaded. Unless you actually run that JavaScript in a web browser, you won't see the entire DOM you see in Firebug.
You can try this:
public override void Render(HtmlTextWriter writer):
{
StringBuilder renderedOutput = new StringBuilder();
Streamwriter strWriter = new StringWriter(renderedOutput);
HtmlTextWriter tWriter = new HtmlTextWriter(strWriter);
base.Render(tWriter);
string html = tWriter.InnerWriter.ToString();
string filename = Server.MapPath(".") + "\\data.txt";
outputStream = new FileStream(filename, FileMode.Create);
StreamWriter sWriter = new StreamWriter(outputStream);
sWriter.Write(renderedOutput.ToString());
sWriter.Flush();
//render for output
writer.Write(renderedOutput.ToString());
}
I will recommend you to use following rendering engine instead of the Web Browser
https://github.com/cefsharp/CefSharp