Azure Website and Worker Role html to pdf - c#

I have tried several .NET pdf libraries to create a pdf page from a html page.
Well in Azure it's not working for a website because i'm receiving a timeout.
I found on the web people are talking about running the pdf converting as a worker role.
Anyone knows how to configure a worker role to work with the azure website.
I cannot find much info on the web about this. Is it possible?

You're conflating things. Azure Websites is a service. Azure worker role is a stateless virtual machine running in a Cloud Service. They are two separate things. Plus, you do not need a worker role to generate PDFs (though it's certainly a viable option). You simply need the ability to install your PDF-rendering software, whether that's in Windows or in Linux.
You will not be able to install such software in Azure Websites, but you can install software in Azure web/worker roles (via startup scripts) or Virtual Machines (via ssh/rdp). Which you choose, as well as the PDF library you choose, is completely up to you (and out of scope here, since that level of architecture is subjective).

EVO has the solution for converting HTML to PDF in Azure Websites. Here is the code snippet for creating PDF from HTML in Azure Websites:
protected void convertToPdfButton_Click(object sender, EventArgs e)
{
// Get the server IP and port
String serverIP = textBoxServerIP.Text;
uint serverPort = uint.Parse(textBoxServerPort.Text);
// Create a HTML to PDF converter object with default settings
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter(serverIP, serverPort);
// Set optional service password
if (textBoxServicePassword.Text.Length > 0)
htmlToPdfConverter.ServicePassword = textBoxServicePassword.Text;
// Set HTML Viewer width in pixels which is the equivalent in converter of the browser window width
htmlToPdfConverter.HtmlViewerWidth = int.Parse(htmlViewerWidthTextBox.Text);
// Set HTML viewer height in pixels to convert the top part of a HTML page
// Leave it not set to convert the entire HTML
if (htmlViewerHeightTextBox.Text.Length > 0)
htmlToPdfConverter.HtmlViewerHeight = int.Parse(htmlViewerHeightTextBox.Text);
// Set PDF page size which can be a predefined size like A4 or a custom size in points
// Leave it not set to have a default A4 PDF page
htmlToPdfConverter.PdfDocumentOptions.PdfPageSize = SelectedPdfPageSize();
// Set PDF page orientation to Portrait or Landscape
// Leave it not set to have a default Portrait orientation for PDF page
htmlToPdfConverter.PdfDocumentOptions.PdfPageOrientation = SelectedPdfPageOrientation();
// Set the maximum time in seconds to wait for HTML page to be loaded
// Leave it not set for a default 60 seconds maximum wait time
htmlToPdfConverter.NavigationTimeout = int.Parse(navigationTimeoutTextBox.Text);
// Set an adddional delay in seconds to wait for JavaScript or AJAX calls after page load completed
// Set this property to 0 if you don't need to wait for such asynchcronous operations to finish
if (conversionDelayTextBox.Text.Length > 0)
htmlToPdfConverter.ConversionDelay = int.Parse(conversionDelayTextBox.Text);
// The buffer to receive the generated PDF document
byte[] outPdfBuffer = null;
if (convertUrlRadioButton.Checked)
{
string url = urlTextBox.Text;
// Convert the HTML page given by an URL to a PDF document in a memory buffer
outPdfBuffer = htmlToPdfConverter.ConvertUrl(url);
}
else
{
string htmlString = htmlStringTextBox.Text;
string baseUrl = baseUrlTextBox.Text;
// Convert a HTML string with a base URL to a PDF document in a memory buffer
outPdfBuffer = htmlToPdfConverter.ConvertHtml(htmlString, baseUrl);
}
// Send the PDF as response to browser
// Set response content type
Response.AddHeader("Content-Type", "application/pdf");
// Instruct the browser to open the PDF file as an attachment or inline
Response.AddHeader("Content-Disposition", String.Format("{0}; filename=Getting_Started.pdf; size={1}",
openInlineCheckBox.Checked ? "inline" : "attachment", outPdfBuffer.Length.ToString()));
// Write the PDF document buffer to HTTP response
Response.BinaryWrite(outPdfBuffer);
// End the HTTP response and stop the current page processing
Response.End();
}

I'm the author of the Rotativa nuget package. For security policy reasons and OS limitations it can't be used on Azure websites. To address this issue I created a SaaS version on Azure.
It's an API that's really easy to use, just install dedicated nuget package:
PM> Install-Package RotativaHQ
And create PDF files from Razor Views, as easy as:
return new ViewAsPdf(model);
No need for anything special in the View/HTML. No need for absolute URLS in images/css/js links. Works on localhost (dev machine) too.
Currently the service has endpoints in 4 Azure regions: US East, US West, EU North, Southeast Asia.
It's fast since it uses a proprietary protocol to send the web page contents to the API for conversion to PDF.
It's reliable because all endpoints are load balanced.
Details on the web site:
https://rotativahq.com

Related

Fine Uploader session Thumbnails slow to load

I'm creating a mockup file upload tool for a community site using Fine Uploader.
I've got the session set up to retrieve the initial files from the server along with a thumbnail url.
It all works great, however the rendering of the thumbnails is really slow.
I can't work out why. So I hard-coded to use a very small thumbnail for each of the four files. This made no difference.
The server side not the issue. The information is coming back very quickly.
Am I doing something wrong? Why is fineuploader so slow? Here's screen grab. It's taking four seconds to render the four thumbnails.
I'm using latest chrome. It's a NancyFX project on a fairly powerful machine. Rending other pages with big images on them is snappy.
Client side code:
thumbnails: {
placeholders: {
waitingPath: '/Content/js/fine-uploader/placeholders/waiting-generic.png',
notAvailablePath: '/Content/js/fine-uploader/placeholders/not_available-generic.png'
}
},
session: {
endpoint: "/getfiles/FlickaId/342"
},
Server side code:
// Fine uploader makes session request to get existing files
Get["/getfiles/FlickaId/{FlickaId}"] = parameters =>
{
//get the image files from the server
var i = FilesDatabase.GetFlickaImagesById(parameters.FlickaId);
// list to hold the files
var list = new List<UploadedFiles>();
// build the response data object list
foreach (var imageFile in i)
{
var f = new UploadedFiles();
f.name = "test-thumb-small.jpg"; // imageFile.ImageFileName;
f.size = 1;
f.uuid = imageFile.FileGuid;
f.thumbnailUrl = "/Content/images/flickabase/thumbnails/" + "test-thumb-small.jpg"; // imageFile.ImageFileName;
list.Add(f);
}
return Response.AsJson(list); // our model is serialised by Nancy as Json!
};
This is by design, and was implemented both to prevent the UI thread from being flooded with the image scaling logic and to prevent a memory leak issue specific to Chrome. This is explained in the thumbnails and previews section of the documentation, specifically in the "performance considerations" area:
For browsers that support client-generated image previews (qq.supportedFeatures.imagePreviews === true), a configurable pause between template-generated previews is in effect. This is to prevent the complex process of generating previews from overwhelming the client machine's CPU for a lengthy amount of time. Without this limit in place, the browser's UI thread runs the risk of blocking, preventing any user interaction (scrolling, etc) until all previews have been generated.
You can adjust or remove this pause via the thumbnails option, but I suggest you not do this unless you are sure users will not drop a large number of complex image files.

C#: HttpListener Error Serving Content

I have implemented something similar to this
only real difference is
string filename = context.Request.RawUrl.Replace("/", "\\").Remove(0,1);
string path = Uri.UnescapeDataString(Path.Combine(_baseFolder, filename));
so that I can traverse to subdirectories. This works great for webpages and other text file types but when trying to serve up media content I get the exception
HttpListenerException: The I/O
operation has been aborted because of
either a thread exit or an application
request
Followed by
InvalidOperationException: Cannot close stream until all bytes are written.
In the using statement.
Any suggestions on how to handle this or stop these exceptions?
Thanks
I should mention that I am using Google Chrome for my browser (Google Chrome doesn't seem to care about the MIME types, when it sees audio it will try to use it like it's in a HTML5 player), but this is also applicable if you are trying to host media content in a page.
Anyways, I was inspecting my headers with fiddler and noticed that Chrome passes 3 requests to the server. I started playing with other browsers and noticed they did not do this, but depending on the browser and what I had hard coded as the MIME type I would either get a page of crazy text, or a download of the file.
On further inspection I noticed that chrome would first request the file. Then request the file again with a few different headers most notably the range header. The first one with byte=0- then the next with a different size depending on how large the file was (more than 3 requests can be made depending how large the file is).
So there was the problem. Chrome will first ask for the file. Once seeing the type it would send another request which seems to me looking for how large the file is (byte=0-) then another one asking for the second half of the file or something similar to allow for a sort of streaming experienced when using HTML5. I coded something quickly up to handle MIME types and threw a HTML5 page together with the audio component and found that other browsers also do this (except IE)
So here is a quick solution and I no longer get these errors
string range = context.Request.Headers["Range"];
int rangeBegin = 0;
int rangeEnd = msg.Length;
if (range != null)
{
string[] byteRange = range.Replace("bytes=", "").Split('-');
Int32.TryParse(byteRange[0], out rangeBegin);
if (byteRange.Length > 1 && !string.IsNullOrEmpty(byteRange[1]))
{
Int32.TryParse(byteRange[1], out rangeEnd);
}
}
context.Response.ContentLength64 = rangeEnd - rangeBegin;
using (Stream s = context.Response.OutputStream)
{
s.Write(msg, rangeBegin, rangeEnd - rangeBegin);
}
Try:
using (Stream s = context.Response.OutputStream)
{
s.Write(msg, 0, msg.Length);
s.Flush()
}

ASP.NET web page to PDF

I looking for the easiest way to export the currently viewed asp.net web page to a PDF document using iTextSharp - it can be either a screenshot of it or passing in the url to generate the document. Sample code would be greatly appreciated. Many thanks in advance!
On a past project, we used Supergoo ABCPDF
to do something like you need to and it worked quite well. You basicly feed it an url and it process the HTML page into a PDF.
On the downside, it's a licensed software with costs associated to it and we had some performance issues when exporting a lot of large PDF at the same time.
Hope this helps !
Adding to what Darin said in his comment.
You can try using wkhtmltopdf to generate PDF files. It takes URL as input. This is what I have used for my SO application so2pdf.
It is not that easy (or i think so), i had same problem and i had to write code to generate exact page in pdf. It depends of page and used styles etc. So i create drawing of each element.
For some project i used Winnovative HTML to PDF converter but it is not free.
Give a try with PDFSharp (to generate PDF files at runtime), and it's Open Source library :
http://pdfsharp.com/PDFsharp/index.php?option=com_content&task=view&id=27&Itemid=1
Alternative solution : http://www.bratched.com/en/component/content/article/76-how-to-convert-xhtml-to-pdf-in-c.html
There is an example Convert the Current HTML Page to PDF on WInnovative website which does exactly this. The relevant C# code to convert the currently viewed asp.net web page to a PDF document is:
// Controls if the current HTML page will be rendered to PDF or as a normal page
bool convertToPdf = false;
protected void convertToPdfButton_Click(object sender, EventArgs e)
{
// The current ASP.NET page will be rendered to PDF when its Render method will be called by framework
convertToPdf = true;
}
protected override void Render(HtmlTextWriter writer)
{
if (convertToPdf)
{
// Get the current page HTML string by rendering into a TextWriter object
TextWriter outTextWriter = new StringWriter();
HtmlTextWriter outHtmlTextWriter = new HtmlTextWriter(outTextWriter);
base.Render(outHtmlTextWriter);
// Obtain the current page HTML string
string currentPageHtmlString = outTextWriter.ToString();
// Create a HTML to PDF converter object with default settings
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
// Set license key received after purchase to use the converter in licensed mode
// Leave it not set to use the converter in demo mode
htmlToPdfConverter.LicenseKey = "fvDh8eDx4fHg4P/h8eLg/+Dj/+jo6Og=";
// Use the current page URL as base URL
string baseUrl = HttpContext.Current.Request.Url.AbsoluteUri;
// Convert the current page HTML string a PDF document in a memory buffer
byte[] outPdfBuffer = htmlToPdfConverter.ConvertHtml(currentPageHtmlString, baseUrl);
// Send the PDF as response to browser
// Set response content type
Response.AddHeader("Content-Type", "application/pdf");
// Instruct the browser to open the PDF file as an attachment or inline
Response.AddHeader("Content-Disposition", String.Format("attachment; filename=Convert_Current_Page.pdf; size={0}", outPdfBuffer.Length.ToString()));
// Write the PDF document buffer to HTTP response
Response.BinaryWrite(outPdfBuffer);
// End the HTTP response and stop the current page processing
Response.End();
}
else
{
base.Render(writer);
}
}

Generate PDF from ASP.NET from raw HTML/CSS content?

I'm sending emails that have invoices attached as PDFs. I'm already - elsewhere in the application - creating the invoices in an .aspx page. I'd like to use Server.Execute to return the output HTML and generate a PDF from that. Otherwise, I'd have to use a reporting tool to "draw" the invoice on a PDF. That blows for lots of reasons, not the least of which is that I'd have to update both the .aspx page and the report for every minor change. What to do...
There is no way to generate a PDF from an HTML string directly within .NET, but there are number of third party controls that work well.
I've had success with this one: http://www.html-to-pdf.net
and this: http://www.htmltopdfasp.net
The important questions to ask are:
Does it render correctly as compared to the 3 major browsers: IE, FF and Safari/Chrome?
Does it handle CSS fine?
Does the control have it's own rendering engine? If so, bounce it. You don't want to trust a home grown rendering engine - the browsers have a hard enough problem getting everything pixel perfect.
What dependencies does the third party control require? The fewer, the better.
There are a few others but they deal with ActiveX displays and such.
We use a product called ABCPDF for this and it works fantastic.
http://www.websupergoo.com/abcpdf-1.htm
This sounds like a job for Prince. It can take HTML and CSS and generate a PDF, which you can then present to your users. It supports CSS3 better than most web browsers (staff include HÃ¥kon Wium Lie, the inventor of CSS).
See the samples, especially the ones for Wikipedia pages, for the beautiful output it can generate. There's also an interesting Google Tech Talk with the authors.
Edit: There is a .NET wrapper available.
wkhtmltopdf is a free and cool exe to generate pdf from html. Its written in c++. But nReco htmltopdf is a wrapper dotnet library for this awesome tool. I implemented using this dotnet library and it was just so good it does everything by its own you just need to give html as a data source.
/// <summary>
/// Converts html into PDF using nReco dll and wkhtmltopdf.exe.
/// </summary>
private byte[] ConvertHtmlToPDF()
{
HtmlToPdfConverter nRecohtmltoPdfObj = new HtmlToPdfConverter();
nRecohtmltoPdfObj.Orientation = PageOrientation.Portrait;
nRecohtmltoPdfObj.PageFooterHtml = CreatePDFFooter();
nRecohtmltoPdfObj.CustomWkHtmlArgs = "--margin-top 35 --header-spacing 0 --margin-left 0 --margin-right 0";
return nRecohtmltoPdfObj.GeneratePdf(CreatePDFScript() + ShowHtml() + "</body></html>");
}
The above function is an excerpt from the below link post which explains it in detail.
HTML to PDF in ASP.Net
The initial question is about converting another aspx page containing an invoice to a PDF document. The invoice is probably using some session data and the user suggests to use Server.Execute() to obtain the invoice page HTML code and then to convert that code to PDF. Converting the invoice page URL directly is not possible because a new session would be created during conversion and the session data would be lost.
This is actually a good technique to preserve session data during conversion which is applied in Convert a HTML Page to PDF in Same Session ASP.NET Demo of the EvoPdf library. The complete C# code to get the HTML string rendered by the invoice page and to convert that string to PDF is:
// Execute the invoice page and get the HTML string rendered by this page
TextWriter outTextWriter = new StringWriter();
Server.Execute("Invoice.aspx", outTextWriter);
string htmlStringToConvert = outTextWriter.ToString();
// Create a HTML to PDF converter object with default settings
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
// Use the current page URL as base URL
string baseUrl = HttpContext.Current.Request.Url.AbsoluteUri;
// Convert the page HTML string to a PDF document in a memory buffer
byte[] outPdfBuffer = htmlToPdfConverter.ConvertHtml(htmlStringToConvert, baseUrl);
// Send the PDF as response to browser
// Set response content type
Response.AddHeader("Content-Type", "application/pdf");
// Instruct the browser to open the PDF file as an attachment or inline
Response.AddHeader("Content-Disposition", String.Format("attachment; filename=Convert_Page_in_Same_Session.pdf; size={0}", outPdfBuffer.Length.ToString()));
// Write the PDF document buffer to HTTP response
Response.BinaryWrite(outPdfBuffer);
// End the HTTP response and stop the current page processing
Response.End();
As long as you can make sure to use proper XHTML, you could also use a product like Alt-Soft's Xml2PDF to convert XML (XHTML) into PDF by means of XSLT/XSL-FO.
It takes a bit of a learning curve to master, but it works very well once you've "got" it!
Marc
Since you are producing the answer, you can use a tool like Report.NET:
http://sourceforge.net/projects/report/
I disagree with the answers that say you cannot convert directly from output to PDF, however, as you can "re-call" the page and get the HTML as a stream and convert it. I am not sure what tool you would want to use to do this, however. In other words, it is possible, but I am not sure it is worth it. The PDF creation libs, like Report.NET, even though they force reusing some logic and no automagic converrsion, it is easier.
I have not tried this component, but I have heard good things about it from those who have. The model is more like HTML, but I am not sure you can simply send a rendered ASPX to it to create PDF:
http://www.websupergoo.com/abcpdf-8.htm
If you try to find some html to pdf software via GOOGLE you'll get a pile of this stuff.
There are about 10 leaders but most of them use IE dlls in background mode.
Just couple of them use their own parsing engine.
Please try PDF Duo .NET component in your ASP.NET project if you wish to create a PDF programaticaly.
It is light component for a cool generating of PDF invoces, reports e.g.
I'd go a different route. Assuming you are using SQL Server, use SSRS and generate the PDF that way.
A possible minimal solution to use Server.Execute() to obtain the HTML of the invoice page and convert that code to a PDF using winnovative html to pdf api for .net is:
TextWriter outTextWriter = new StringWriter();
Server.Execute("Invoice.aspx", outTextWriter);
HtmlToPdfConverter htmlToPdfConverter = new HtmlToPdfConverter();
byte[] pdfBytes = htmlToPdfConverter.ConvertHtml(outTextWriter.ToString(),
httpContext.Current.Request.Url.AbsoluteUri);
You can use PDFSharp or iTextSharp to convert html to pdf. PDFSharp is not free.

How do I print an HTML document from a web service?

I want to print HTML from a C# web service. The web browser control is overkill, and does not function well in a service environment, nor does it function well on a system with very tight security constraints. Is there any sort of free .NET library that will support the printing of a basic HTML page? Here is the code I have so far, which does not run properly.
public void PrintThing(string document)
{
if (Thread.CurrentThread.GetApartmentState() != ApartmentState.STA)
{
Thread thread =
new Thread((ThreadStart) delegate { PrintDocument(document); });
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
}
else
{
PrintDocument(document);
}
}
protected void PrintDocument(string document)
{
WebBrowser browser = new WebBrowser();
browser.DocumentText = document;
while (browser.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
browser.Print();
}
This works fine when called from UI-type threads, but nothing happens when called from a service-type thread. Changing Print() to ShowPrintPreviewDialog() yields the following IE script error:
Error: dialogArguments.___IE_PrintType is null or not an object.
URL: res://ieframe.dll/preview.dlg
And a small empty print preview dialog appears.
You can print from the command line using the following:
rundll32.exe
%WINDIR%\System32\mshtml.dll,PrintHTML
"%1"
Where %1 is the file path of the HTML file to be printed.
If you don't need to print from memory (or can afford to write to the disk in a temp file) you can use:
using (Process printProcess = new Process())
{
string systemPath = Environment.GetFolderPath(Environment.SpecialFolder.System);
printProcess.StartInfo.FileName = systemPath + #"\rundll32.exe";
printProcess.StartInfo.Arguments = systemPath + #"\mshtml.dll,PrintHTML """ + fileToPrint + #"""";
printProcess.Start();
}
N.B. This only works on Windows 2000 and above I think.
I know that Visual Studio itself (at least in 2003 version) references the IE dll directly to render the "Design View".
It may be worth looking into that.
Otherwise, I can't think of anything beyond the Web Browser control.
Easy! Split your problem into two simpler parts:
render the HTML to PDF
print the PDF (SumatraPDF)
-print-to-default $file.pdf prints a PDF file on a default printer
-print-to $printer_name $file.pdf prints a PDF on a given printer
If you've got it in the budget (~$3000), check out PrinceXML.
It will render HTML into a PDF, functions well in a service environment, and supports advanced features such as not breaking a page in the middle of a table cell (which a lot of browsers don't currently support).
I tool that works very well for me is HiQPdf. https://www.hiqpdf.com/
The price is reasonable (starts at $245) and it can render HTML to a PDF and also manage the printing of the PDF files directly.
Maybe this will help. http://www.codeproject.com/KB/printing/printhml.aspx
Also not sure what thread you are trying to access the browser control from, but it needs to be STA
Note - The project referred to in the link does allow you to navigate to a page and perform a print without showing the print dialog.
I don't know the specific tools, but there are some utilities that record / replay clicks. In other words, you could automate the "click" on the print dialog. (I know this is a hack, but when all else fails...)

Categories