c# IE Plugin BHO get the pdf - c#

I need help for a little IE Plugin (Browser Helper Object).
What the Plugin should do:
If the user clicks on a link with pdf behind, the plugin should call an exe file installed on the computer. The exe file would check the PDF and open the default pdf application or a special one.
What i have done:
My BHO starts with my IE. I used this code for the beginning:
http://www.codeproject.com/Articles/19971/How-to-attach-to-Browser-Helper-Object-BHO-with-C
I disabled all adobe plugins so the ie download window shows up. Whit this code i can parse the html body add html markup etc. but thats not what i'm trying to do...
My Problem:
I don't know how to grab the pdf. If i call directly a pdf download link the cast of the site object into a InternetExplorer or WebBrowser Object fails.
public int SetSite(object site)
{
if (site != null)
{
ieInstance = site as InternetExplorer;
ieInstance.DocumentComplete += new DWebBrowserEvents2_DocumentCompleteEventHandler(this.OnDocumentComplete);
}
else if (ieInstance != null)
{
ieInstance.DocumentComplete -= new DWebBrowserEvents2_DocumentCompleteEventHandler(this.OnDocumentComplete);
}
return 0;
}
Document = '((SHDocVw.InternetExplorer)(ieInstance)).Document' threw an exception of type 'System.Runtime.InteropServices.COMException'
Could someone tell me, how i can grab the pdf befor the download window in the IE appears? I know there is a Event before download, but also this event don't help me.

What are you trying to achieve more broadly?
You can't grab a HTML document instance when the browser is showing a PDF file because there isn't one-- the PDF object is loaded instead of the HTML-rendering object.
Depending on what you're trying to achieve, there are likely better ways to accomplish it (most likely by registering a MIME Filter for the application/pdf MIME type).

Related

ASP.NET converting HTML to PDF

I have tried multiple plugins and c# classes to try and convert the HTML and CSS on my asp.net project to a pdf and even though the code used looks fine, and the button click works for other functions, I just cannot seem to get any html to pdf function to work. Has anyone else encountered this, or know if there is something I have missed to resolve it?
This is the latest code I have tried for hiqpdf in C#:
protected void Print_Button_Click(object sender, EventArgs e)
{
HtmlToPdf htmlToPdfConverter = new HtmlToPdf();
// set PDF page size, orientation and margins
htmlToPdfConverter.Document.PageSize = PdfPageSize.A4;
htmlToPdfConverter.Document.PageOrientation = PdfPageOrientation.Portrait;
htmlToPdfConverter.Document.Margins = new PdfMargins(0);
// convert HTML to PDF
htmlToPdfConverter.ConvertUrlToFile("http://localhost:51091/Printout","mcn.pdf");
}
It is not stated directly within theHiQpdf documentation of the method, but the method ConvertUrlToFile() stores the produced pdf file locally on the disc. On some example page (Convert URLs and HTML Code to PDF) the following comment can be found:
// ConvertUrlToFile() is called to convert the html document and save the resulted PDF into a file on disk
// Alternatively, ConvertUrlToMemory() can be called to save the resulted PDF in a buffer in memory
htmlToPdfConverter.ConvertUrlToFile(url, pdfFile);
Since your example shows a button-click eventhandler, the file is probably generated but not used to return in the http response. You have to write the data into the response. The methods ConvertToStream() or ConvertToMemory should come in handy to do so. Don't forget to use Response.Clear() or Response.ClearContent() and Response.ClearHeader() before that and Flush() and Close() afterwards.

Download the html source code to a string from a open website in IE with C#?

I been looking all over for this answer but can't find it anywhere..
This is what I want to be able to do:
I have a form application where i have a button that says "collect html code". When I press this button I want C# to download the HTML source code from the website I'm currently on (using IE). I've been using this code:
WebClient web = new WebClient();
string html = web.DownloadString("www.example.com");
But now I don't want to specify the URL in my code! And I don't want to use a webbrowser in my application.
Anyone got a solution?
Thanks!
With this code you can get IE7 and later version URL in opened tabs :
SHDocVw.ShellWindows allBrowsers = new SHDocVw.ShellWindows();
foreach (SHDocVw.InternetExplorer ieInst in allBrowsers )
{
String url = ieInst.LocationURL;
// do your stuff
}
So you can access the urls and do your stuff with WebClient class.
You need to add a reference to a COM component called Microsoft Internet Controls
You are talking about getting URLs from IE window? If so, here you are:
var urls = (new SHDocVw.ShellWindows()).Cast<SHDocVw.InternetExplorer>().
Select(x => x.LocationURL).ToArray();
Don't forget to add COM reference "Microsoft Internet Controls" in your project.

Printing transformed XML

Due to circumstances beyond my control, I'm replacing all of our Crystal Reports with home-built XML reports, which are working beautifully. For most of the reports that pop up a Crystal Reports viewer, the following code opens them nicely in IE, transforming it to HTML via an XSLT stylesheet.
ProcessStartInfo psi = new ProcessStartInfo(reportFilename)
{
UseShellExecute = true
};
using (Process p = new Process {StartInfo = psi})
{
p.Start();
}
The problem is that some of the reports just print directly to a printer, never showing the report to the user, which works fine in CR. I can't figure out how to do this using the code above.
I'd prefer not to specifically launch an IE process if possible, but I am guaranteed that they're running Windows, so that's not a hard requirement. Also, will printing directly in this manner transform the XML into HTML via the XSL and print that, or just print the actual XML text?
EDIT: I've already tried adding:
Verb = "Print"
to the ProcessStartInfo object, but that winds up with an exception being thrown saying:
"No application is associated with the specified file for this operation"
EDIT AGAIN: Specifying IE as the exe to launch loaded the XML again just fine, but does not offer a "print" action. Adding "window.print()" in a JavaScript block works, but requires manually clicking the print button after allowing the script to run, because IE blocks it.
EDIT THE THIRD: My boss told me not to worry about it, that they can print from IE. I still want to figure this out. I've tried the command line "print.exe", but that only prints the raw XML to the printer. Tried XslCompiledTransform with a PrintDocument, but that's not what I'm looking for, either.
Figured it out, finally. I just created an invisible WebBrowser control that does the IE rendering, and on DocumentCompleted, call its Print() method. Worked like a charm using the default printer settings.
private static void PrintReport(string reportFilename)
{
WebBrowser browser = new WebBrowser();
browser.DocumentCompleted += browser_DocumentCompleted;
browser.Navigate(reportFilename);
}
private static void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = sender as WebBrowser;
if (null == browser)
{
return;
}
browser.Print();
browser.Dispose();
}

Generate PDF from ASP.NET from raw HTML/CSS content?

I'm sending emails that have invoices attached as PDFs. I'm already - elsewhere in the application - creating the invoices in an .aspx page. I'd like to use Server.Execute to return the output HTML and generate a PDF from that. Otherwise, I'd have to use a reporting tool to "draw" the invoice on a PDF. That blows for lots of reasons, not the least of which is that I'd have to update both the .aspx page and the report for every minor change. What to do...
There is no way to generate a PDF from an HTML string directly within .NET, but there are number of third party controls that work well.
I've had success with this one: http://www.html-to-pdf.net
and this: http://www.htmltopdfasp.net
The important questions to ask are:
Does it render correctly as compared to the 3 major browsers: IE, FF and Safari/Chrome?
Does it handle CSS fine?
Does the control have it's own rendering engine? If so, bounce it. You don't want to trust a home grown rendering engine - the browsers have a hard enough problem getting everything pixel perfect.
What dependencies does the third party control require? The fewer, the better.
There are a few others but they deal with ActiveX displays and such.
We use a product called ABCPDF for this and it works fantastic.
http://www.websupergoo.com/abcpdf-1.htm
This sounds like a job for Prince. It can take HTML and CSS and generate a PDF, which you can then present to your users. It supports CSS3 better than most web browsers (staff include HÃ¥kon Wium Lie, the inventor of CSS).
See the samples, especially the ones for Wikipedia pages, for the beautiful output it can generate. There's also an interesting Google Tech Talk with the authors.
Edit: There is a .NET wrapper available.
wkhtmltopdf is a free and cool exe to generate pdf from html. Its written in c++. But nReco htmltopdf is a wrapper dotnet library for this awesome tool. I implemented using this dotnet library and it was just so good it does everything by its own you just need to give html as a data source.
/// <summary>
/// Converts html into PDF using nReco dll and wkhtmltopdf.exe.
/// </summary>
private byte[] ConvertHtmlToPDF()
{
HtmlToPdfConverter nRecohtmltoPdfObj = new HtmlToPdfConverter();
nRecohtmltoPdfObj.Orientation = PageOrientation.Portrait;
nRecohtmltoPdfObj.PageFooterHtml = CreatePDFFooter();
nRecohtmltoPdfObj.CustomWkHtmlArgs = "--margin-top 35 --header-spacing 0 --margin-left 0 --margin-right 0";
return nRecohtmltoPdfObj.GeneratePdf(CreatePDFScript() + ShowHtml() + "</body></html>");
}
The above function is an excerpt from the below link post which explains it in detail.
HTML to PDF in ASP.Net
The initial question is about converting another aspx page containing an invoice to a PDF document. The invoice is probably using some session data and the user suggests to use Server.Execute() to obtain the invoice page HTML code and then to convert that code to PDF. Converting the invoice page URL directly is not possible because a new session would be created during conversion and the session data would be lost.
This is actually a good technique to preserve session data during conversion which is applied in Convert a HTML Page to PDF in Same Session ASP.NET Demo of the EvoPdf library. The complete C# code to get the HTML string rendered by the invoice page and to convert that string to PDF is:
// Execute the invoice page and get the HTML string rendered by this page
TextWriter outTextWriter = new StringWriter();
Server.Execute("Invoice.aspx", outTextWriter);
string htmlStringToConvert = outTextWriter.ToString();
// Create a HTML to PDF converter object with default settings
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
// Use the current page URL as base URL
string baseUrl = HttpContext.Current.Request.Url.AbsoluteUri;
// Convert the page HTML string to a PDF document in a memory buffer
byte[] outPdfBuffer = htmlToPdfConverter.ConvertHtml(htmlStringToConvert, baseUrl);
// Send the PDF as response to browser
// Set response content type
Response.AddHeader("Content-Type", "application/pdf");
// Instruct the browser to open the PDF file as an attachment or inline
Response.AddHeader("Content-Disposition", String.Format("attachment; filename=Convert_Page_in_Same_Session.pdf; size={0}", outPdfBuffer.Length.ToString()));
// Write the PDF document buffer to HTTP response
Response.BinaryWrite(outPdfBuffer);
// End the HTTP response and stop the current page processing
Response.End();
As long as you can make sure to use proper XHTML, you could also use a product like Alt-Soft's Xml2PDF to convert XML (XHTML) into PDF by means of XSLT/XSL-FO.
It takes a bit of a learning curve to master, but it works very well once you've "got" it!
Marc
Since you are producing the answer, you can use a tool like Report.NET:
http://sourceforge.net/projects/report/
I disagree with the answers that say you cannot convert directly from output to PDF, however, as you can "re-call" the page and get the HTML as a stream and convert it. I am not sure what tool you would want to use to do this, however. In other words, it is possible, but I am not sure it is worth it. The PDF creation libs, like Report.NET, even though they force reusing some logic and no automagic converrsion, it is easier.
I have not tried this component, but I have heard good things about it from those who have. The model is more like HTML, but I am not sure you can simply send a rendered ASPX to it to create PDF:
http://www.websupergoo.com/abcpdf-8.htm
If you try to find some html to pdf software via GOOGLE you'll get a pile of this stuff.
There are about 10 leaders but most of them use IE dlls in background mode.
Just couple of them use their own parsing engine.
Please try PDF Duo .NET component in your ASP.NET project if you wish to create a PDF programaticaly.
It is light component for a cool generating of PDF invoces, reports e.g.
I'd go a different route. Assuming you are using SQL Server, use SSRS and generate the PDF that way.
A possible minimal solution to use Server.Execute() to obtain the HTML of the invoice page and convert that code to a PDF using winnovative html to pdf api for .net is:
TextWriter outTextWriter = new StringWriter();
Server.Execute("Invoice.aspx", outTextWriter);
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
byte[] pdfBytes = htmlToPdfConverter.ConvertHtml(outTextWriter.ToString(),
httpContext.Current.Request.Url.AbsoluteUri);
You can use PDFSharp or iTextSharp to convert html to pdf. PDFSharp is not free.

How do I print an HTML document from a web service?

I want to print HTML from a C# web service. The web browser control is overkill, and does not function well in a service environment, nor does it function well on a system with very tight security constraints. Is there any sort of free .NET library that will support the printing of a basic HTML page? Here is the code I have so far, which does not run properly.
public void PrintThing(string document)
{
if (Thread.CurrentThread.GetApartmentState() != ApartmentState.STA)
{
Thread thread =
new Thread((ThreadStart) delegate { PrintDocument(document); });
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
}
else
{
PrintDocument(document);
}
}
protected void PrintDocument(string document)
{
WebBrowser browser = new WebBrowser();
browser.DocumentText = document;
while (browser.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
browser.Print();
}
This works fine when called from UI-type threads, but nothing happens when called from a service-type thread. Changing Print() to ShowPrintPreviewDialog() yields the following IE script error:
Error: dialogArguments.___IE_PrintType is null or not an object.
URL: res://ieframe.dll/preview.dlg
And a small empty print preview dialog appears.
You can print from the command line using the following:
rundll32.exe
%WINDIR%\System32\mshtml.dll,PrintHTML
"%1"
Where %1 is the file path of the HTML file to be printed.
If you don't need to print from memory (or can afford to write to the disk in a temp file) you can use:
using (Process printProcess = new Process())
{
string systemPath = Environment.GetFolderPath(Environment.SpecialFolder.System);
printProcess.StartInfo.FileName = systemPath + #"\rundll32.exe";
printProcess.StartInfo.Arguments = systemPath + #"\mshtml.dll,PrintHTML """ + fileToPrint + #"""";
printProcess.Start();
}
N.B. This only works on Windows 2000 and above I think.
I know that Visual Studio itself (at least in 2003 version) references the IE dll directly to render the "Design View".
It may be worth looking into that.
Otherwise, I can't think of anything beyond the Web Browser control.
Easy! Split your problem into two simpler parts:
render the HTML to PDF
print the PDF (SumatraPDF)
-print-to-default $file.pdf prints a PDF file on a default printer
-print-to $printer_name $file.pdf prints a PDF on a given printer
If you've got it in the budget (~$3000), check out PrinceXML.
It will render HTML into a PDF, functions well in a service environment, and supports advanced features such as not breaking a page in the middle of a table cell (which a lot of browsers don't currently support).
I tool that works very well for me is HiQPdf. https://www.hiqpdf.com/
The price is reasonable (starts at $245) and it can render HTML to a PDF and also manage the printing of the PDF files directly.
Maybe this will help. http://www.codeproject.com/KB/printing/printhml.aspx
Also not sure what thread you are trying to access the browser control from, but it needs to be STA
Note - The project referred to in the link does allow you to navigate to a page and perform a print without showing the print dialog.
I don't know the specific tools, but there are some utilities that record / replay clicks. In other words, you could automate the "click" on the print dialog. (I know this is a hack, but when all else fails...)

Categories