HTML to PDF conversion using Aspose - c#

I am new to Aspose but I have successfully converted several file formats into PDF's but I am struck with HTML to PDF conversion. I am able to convert a HTML file into a PDF successfully but the CSS part is not rendering into the generated PDF. Any idea on this? I saved www.google.com as my input HTML file. Here is my controller code.
using Aspose.Pdf.Generator
Pdf pdf = new Pdf();
pdf.HtmlInfo.CharSet = "UTF-8";
Section section = pdf.Sections.Add();
StreamReader r = File.OpenText(#"Local HTML File Path");
Text text2 = new Aspose.Pdf.Generator.Text(section, r.ReadToEnd());
pdf.HtmlInfo.ExternalResourcesBasePath = "Local HTML File Path";
text2.IsHtmlTagSupported = true;
text2.IsFitToPage = true;
section.Paragraphs.Add(text2);
pdf.Save(#"Generated PDF File Path");
Am i missing something? Any kind of help is greatly appreciated.
Thanks

My name is Tilal Ahmad and I am developer evangelist at Aspose.
Please use new DOM approach(Aspose.Pdf.Document) for HTML to PDF conversion. In this approach to render external resources(CSS/Images/Fonts) you need to pass resources path to HtmlLoadOptions() method. Please check following documentation links for the purpose.
Convert HTML to PDF (new DOM)
HtmlLoadOptions options = new HtmlLoadOptions(resourcesPath);
Document pdfDocument = new Document(inputPath, options);
pdfDocument.Save("outputPath");
Convert Webpage to PDF(new DOM)
// Create a request for the URL.
WebRequest request = WebRequest.Create("https:// En.wikipedia.org/wiki/Main_Page");
// If required by the server, set the credentials.
request.Credentials = CredentialCache.DefaultCredentials;
// Time out in miliseconds before the request times out
// Request.Timeout = 100;
// Get the response.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();
// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream);
// Read the content.
string responseFromServer = reader.ReadToEnd();
reader.Close();
dataStream.Close();
response.Close();
MemoryStream stream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(responseFromServer));
HtmlLoadOptions options = new HtmlLoadOptions("https:// En.wikipedia.org/wiki/");
// Load HTML file
Document pdfDocument = new Document(stream, options);
options.PageInfo.IsLandscape = true;
// Save output as PDF format
pdfDocument.Save(outputPath);

Try using media attribute in each style tag
<style media="print">
and then provide the html file to your Aspose.Pdf Generator.

Try this.. This is working nice for me
var license = new Aspose.Pdf.License();
license.SetLicense("Aspose.Pdf.lic");
var license = new Aspose.Html.License();
license.SetLicense("Aspose.Html.lic");
using (MemoryStream memoryStream = new MemoryStream())
{
var options = new PdfRenderingOptions();
using (PdfDevice pdfDevice = new PdfDevice(options, memoryStream))
{
using (var renderer = new HtmlRenderer())
{
using (HTMLDocument htmlDocument = new HTMLDocument(content, ""))
{
renderer.Render(pdfDevice, htmlDocument);
//Save memoryStream into output pdf file
}
}
}
}
content is string type which is my html content.

Related

How to read Json response into Document object of Document AI in C#

I am trying to read json file from cloud storage and trying to convert that into Google.Cloud.DocumentAI.V1.Document.
I have done POC, but its throwing exception Google.Protobuf.InvalidProtocolBufferException: 'Protocol message end-group tag did not match expected tag.'
First I am reading .Json file into MemoryStream and trying to Merge in to Document.
using Google.Cloud.DocumentAI.V1;
public static void StreamToDocument()
{
byte[] fileContents = File.ReadAllBytes("C:\\upload\\temp.json");
using (Stream memoryStream = new MemoryStream(fileContents))
{
Document doc = new Document();
var byteArray = memoryStream;
doc.MergeFrom(byteArray);
}
}
Error Message I am getting is
Is there any other way I can achieve this ?
The code that you've specified there expects the data to be binary protobuf content. For JSON, you want:
string json = File.ReadAllText("C:\\upload\\temp.json");
Document document = Document.Parser.ParseJson(json);

Inserting a doc file inplace of place holder

I have a word document which contain many pages. One of those pages contain a placeholder instead of other content. so I want to replace that placeholder with another doc file without losing formatting. This doc file which is to be replaced may have many pages. How can I replace that placeholder with this doc file programmatically.. I searched many but could not find any option to insert a doc file replacing a placeholder.. Thank You In Advance.
Or how can we copy the contents of doc to be inserted and then replace the placeholder with copied content
I found a post here.The below code is from that post.
With the library, you can do the following to replace text from a Word document, considering that documentByteArray is your document byte content taken from database:
using (MemoryStream mem = new MemoryStream())
{
mem.Write(documentByteArray, 0, (int)documentByteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
if instead of "Hi Everyone" if we replace it with a binarydata,which is an array of bytes
byte[] binarydata = File.ReadAllBytes(filepaths);
how can we modify the program?
First of all you should get a Nuget package called Novacode.Docx, this is what I have found to be the best Document creator and editor in the last few years.
using Novacode.Docx;
void Main()
{
var doc = DocX.Load(#"c:\temp\existingDoc.docx");
var docToAdd = DocX.Load(#"c:\temp\docToAdd.docx");
doc.InsertDocument(docToAdd, true); //version 1.0.0.22
doc.InsertDocument(docToAdd); //version 1.0.0.19
}
this is the most simple and basic implementation of what it is that youre after but this works.
for anything else take a look at the documentation at
https://docx.codeplex.com/
or
http://cathalscorner.blogspot.co.uk/
this will be the best place to start. I would also recommend that if you do use this one that you use the version 1.0.0.19 as there are some formatting issues in 1.0.0.22

save Pdf File in the particular folder instead of downloading

I am doing html to pdf file . Its Downloading instantly . I dont want download instantly. i want to save the file in my project folder once converted.
My C# Code
string html ="<table><tr><td>some contents</td></tr></table>";
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=WelcomeLetter.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
StringReader sr = new StringReader(table);
Document ResultPDF = new Document(iTextSharp.text.PageSize.A4, 25, 10, 20, 30);
PdfPTable Headtable = new PdfPTable(7);
Headtable.TotalWidth = 525f;
Headtable.LockedWidth = true;
Headtable.HeaderRows = 5;
Headtable.FooterRows = 2;
Headtable.KeepTogether = true;
HTMLWorker htmlparser = new HTMLWorker(ResultPDF);
PdfWriter.GetInstance(ResultPDF, Response.OutputStream);
ResultPDF.Open();
htmlparser.Parse(sr);
ResultPDF.Close();
Response.Write(ResultPDF);
Response.End();
For saving pdf file locally in your project folder you can use FileStream class like this.
FileStream stream = new FileStream(filePath, FileMode.Create);//Here filePath is path of your project folder.
Now use this stream instead of using Response.OutputStream when you create instance of PdfWriter object.
PdfWriter.GetInstance(ResultPDF, stream);
Now do not use Responce.Write as you don't want to download your file.And close your stream at end.
stream.Close();
I'm going to combine everyone's answer into one that you should be able to drop in and use. If this works, I would accept Manish Parakhiya's answer because that had the most important part.
First, I'm going to assume you are using a recent version of iTextSharp. I think 5.5.5 is the most recent version. Second, because of this, I'm going to restructure your code a bit in order to use the using pattern. If you're stuck on an older obsolete unsupported version like 4.1.6 you'll need to re-adjust.
Almost every tutorial out there shows you that you can bind directly the Response.OutputStream. This is 100% valid but I would argue that it is also a really bad idea. Instead, bind to a more generic MemoryStream. This makes debugging much easier and your code will port and adapt that much easier.
The below code includes comments about each of the changes and what things are actually doing. The top section is all about creating a PDF from a string of HTML. The bottom actually does something with it, including writing it to disk and/or streaming it to a browser.
//Will hold our PDF eventually
Byte[] bytes;
//HTML that we want to parse
string html = "<table><tr><td>some contents</td></tr></table>";
//Create a MemoryStream to write our PDF to
using (var ms = new MemoryStream()) {
//Create our document abstraction
using (var ResultPDF = new Document(iTextSharp.text.PageSize.A4, 25, 10, 20, 30)) {
//Bind a writer to our Document abstraction and our stream
using (var writer = PdfWriter.GetInstance(ResultPDF, ms)) {
//Open the PDF for writing
ResultPDF.Open();
//Parse our HTML using the old, obsolete, not support parser
using (var sw = new StringWriter()) {
using (var hw = new HtmlTextWriter(sw)) {
using (var sr = new StringReader(html)) {
using (var htmlparser = new HTMLWorker(ResultPDF)) {
htmlparser.Parse(sr);
}
}
}
}
//Close the PDF
ResultPDF.Close();
}
}
//Grab the raw bytes of the PDF
bytes = ms.ToArray();
}
//At this point, the bytes variable holds a valid PDF file.
//You can write it disk:
System.IO.File.WriteAllBytes("your file path here", bytes);
//You can also send it to a browser:
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=WelcomeLetter.pdf");
Response.BinaryWrite(bytes);
Response.Cache.SetCacheability(HttpCacheability.NoCache);
//Never do the next line, it doesn't do what you think it does and actually produces corrupt PDFs
//Response.Write(ResultPDF); //BAD!!!!!!
Response.End();
string tempDirectory = Session.SessionID.ToString();
string location = Path.Combine(Server.MapPath(
WebConfigurationManager.AppSettings["PathSet"].ToString()), tempDirectory);
if (!Directory.Exists(location))
{
Directory.CreateDirectory(location);
}
string fileName="abc.pdf";
filePath = Path.Combine(location, fileName);

Image is not opening after converting with aspose

I used the below code to convert url stream into tiff image. But after conversion, the convert image is not opening for preview. Any Ideas?
var myRequest = (HttpWebRequest)WebRequest.Create("http://www.google.com");
myRequest.Method = "GET";
var myResponse = myRequest.GetResponse();
var responseStream = myResponse.GetResponseStream();
var memoryStream = new MemoryStream();
responseStream.CopyTo(memoryStream);
var loadOptions = new LoadOptions();
loadOptions.LoadFormat = LoadFormat.Html;
var doc = new Document(memoryStream, loadOptions);
var htmlOptions = new HtmlFixedSaveOptions();
htmlOptions.ExportEmbeddedCss = true;
htmlOptions.ExportEmbeddedFonts = true;
htmlOptions.ExportEmbeddedImages = true;
doc.Save(#"C:\out.tif", htmlOptions);
You are using HtmlFixedSaveOptions in Save() method, so it will save as HTML. Try to open the out.tif in any text editor, you will see HTML tags.
Please use ImageSaveOptions in Save() method, to save in image format. Even then, if you manually get the web page from URL in stream, it only gets the HTML. Without css, the saved image will not look good. I would recommend to let Aspose handle the URL.
// If you provide a URL in string, Aspose will load the web page
var doc = new Aspose.Words.Document("http://www.google.com");
// If you just provide the TIF extension, it will save as TIFF image
doc.Save(#"c:\out.tif");
// TO customize, you can use save options in save method
I work for Aspose as Developer Evangelist.

asp.net open word document

I try to open a word document with c#.
When I open the document, the page is blocked after.
Here is the code :
HttpContext.Current.Response.Write(temp);
//HttpContext.Current.Response.End();
//HttpContext.Current.Response.Flush();
//HttpContext.Current.Response.Write(sw.ToString());
//HttpContext.Current.Response.clear();
//HttpContext.Current.Response.End();
//HttpContext.Current.Response.SuppressContent = true;
//HttpContext.Current.Response.Close();
//Response.Redirect(Page.Request.Url.AbsolutePath.Substring(0, Page.Request.Url.AbsolutePath.LastIndexOf("/")) + "/PIEditor.aspx?PostID=" + Request.Params["PostID"], true);`
//HttpContext.Current.Response.End();
As you see, I tried different options but without result, the window for opening or saving the document is displayed but I can't click on any buttons the page after. It looks like it is deactivated or stopped.
you can try GemBox.Document component to export Word document from ASP.NET application, if that is what you are trying to do.
Here is a sample C# code that should go in ASPX page code behind:
// Create a new empty document.
DocumentModel document = new DocumentModel();
// Add document content.
document.Sections.Add(new Section(document, new Paragraph(document, "Hello World!")));
// Microsoft Packaging API cannot write directly to Response.OutputStream.
// Therefore we use temporary MemoryStream.
using (MemoryStream documentStream = new MemoryStream())
{
document.Save(documentStream, SaveOptions.DocxDefault);
// Stream file to browser.
Response.Clear();
Response.ContentType = "application/vnd.openxmlformats";
Response.AddHeader("Content-Disposition", "attachment; filename=Document.docx");
documentStream.WriteTo(Response.OutputStream);
Response.End();
}
Try the below code:
//create new MemoryStream object and add PDF file’s content to outStream.
MemoryStream outStream = new MemoryStream();
//specify the duration of time before a page cached on a browser expires
Response.Expires = 0;
//specify the property to buffer the output page
Response.Buffer = true;
//erase any buffered HTML output
Response.ClearContent();
//add a new HTML header and value to the Response sent to the client
Response.AddHeader(“content-disposition”, “inline; filename=” + “output.doc”);
//specify the HTTP content type for Response as Pdf
Response.ContentType = “application/msword”;
//write specified information of current HTTP output to Byte array
Response.BinaryWrite(outStream.ToArray());
//close the output stream
outStream.Close();
//end the processing of the current page to ensure that no other HTML content is sent
Response.End();

Categories