I am using Spire PDF to convert my HTML template to PDF file. Here is the sample code for the same:
class Program
{
static void Main(string[] args)
{
//Create a pdf document.
PdfDocument doc = new PdfDocument();
PdfPageSettings setting = new PdfPageSettings();
setting.Size = new SizeF(1000,1000);
setting.Margins = new Spire.Pdf.Graphics.PdfMargins(20);
PdfHtmlLayoutFormat htmlLayoutFormat = new PdfHtmlLayoutFormat();
htmlLayoutFormat.IsWaiting = true;
String url = "https://www.wikipedia.org/";
Thread thread = new Thread(() =>
{ doc.LoadFromHTML(url, false, false, false, setting,htmlLayoutFormat); });
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();
//Save pdf file.
doc.SaveToFile("output-wiki.pdf");
doc.Close();
//Launching the Pdf file.
System.Diagnostics.Process.Start("output-wiki.pdf");
}
}
This is working as expected but now I want to add Header and Footer to all the pages. Though adding header and footer is possible using SprirePdf but my requirement is to add HTML template to the Header which I am not able to achieve. Is there any way to render html template to Header and footer?
Spire.PDF provides a class PdfHTMLTextElement supporting to render simple HTML tags including Font, B, I, U, Sub, Sup and BR on a PDF page. You can append HTML to header space in an existing PDF document using the following code snippet. As far as I know, there is no way to render complicated HTML only as a part of the document by using Spire.PDF.
//load an existing pdf document
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(#"C:\Users\Administrator\Desktop\sample.pdf");
//loop through the pages
for (int i = 0; i < doc.Pages.Count; i++)
{
//get the specfic page
PdfPageBase page = doc.Pages[i];
//define HTML string
string htmlText = "<b>XXX lnc.</b><br/><i>Tel:889 974 544</i><br/><font color='#FF4500'>Website:www.xxx.com</font>";
//render HTML text
PdfFont font = new PdfFont(PdfFontFamily.Helvetica, 12);
PdfBrush brush = PdfBrushes.Black;
PdfHTMLTextElement richTextElement = new PdfHTMLTextElement(htmlText, font, brush);
richTextElement.TextAlign = TextAlign.Left;
//draw html string at the top white space
richTextElement.Draw(page.Canvas, new RectangleF(70, 20, page.GetClientSize().Width - 140, page.GetClientSize().Height - 20));
}
//save to file
doc.SaveToFile("output.pdf");
Related
So I am trying to convert a standard A4 PDF file into a .txt file using Spire.Pdf NuGet Package, and whenever I do it there is a lot of whitespace at the start of each line where the margins of the document go I presume. I managed to solve the issue using the TrimStart() method but I want to be able to do remove the margins using Spire.Pdf itself.
I have played around with setting a PdfTextExtractOptions ExtractArea RectangleF but for some reason it cuts the bottom of the text and I lose rows.
My code is:
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(#"path");
var content = new List<string>();
RectangleF rectangle = new RectangleF(45, 0, 0, 0);
PdfTextExtractOptions options = new() { IsExtractAllText = true, IsShowHiddenText = true, ExtractArea = rectangle };
foreach (PdfPageBase page in doc.Pages)
{
PdfTextExtractor textExtractor = new(page);
//extract text from a specific rectangular area here - defualt A4 margin sizes?
string extractedText = textExtractor.ExtractText(options);
content.Add(extractedText);
}
FileStream fs = new FileStream(#"outputFile.txt", FileMode.Create);
StreamWriter sw = new StreamWriter(fs);
string txtBefore = (string.Join("\n", content));
sw.Write(txtBefore);
Thanks in advance
You can try the code below to extract text from PDF, it will not generate extra white spaces at the start of each line in the result .txt file. I already tested it.
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(#"test.pdf");
PdfTextExtractOptions options = new PdfTextExtractOptions();
options.IsSimpleExtraction = true;
StringBuilder sb = new StringBuilder();
foreach (PdfPageBase page in doc.Pages)
{
PdfTextExtractor extractor = new PdfTextExtractor(page);
sb.AppendLine(extractor.ExtractText(options));
}
File.WriteAllText("Extract.txt", sb.ToString());
I am using ASP for this and I had to generate reports in PDF format and send the file back to clients so they can download it.
I made the reports using MigraDoc library and they were great but after I tried it with Arabic text I found the texts were in LTR and the characters were disjointed so I made this code to test things out
...............
MigraDoc.DocumentObjectModel.Document reportDoc = new MigraDoc.DocumentObjectModel.Document();
reportDoc.Info.Title = "test";
sec = reportDoc.AddSection();
string fileName = "test.pdf";
addformattedText(sec, "العبارة", true);
PdfDocumentRenderer renderer = new PdfDocumentRenderer(true);
renderer.Document = reportDoc;
renderer.RenderDocument();
MemoryStream pdfStream = new MemoryStream();
renderer.PdfDocument.Save(pdfStream);
byte[] bytes = pdfStream.ToArray();
...............
private void addformattedText(Section sec,string text, bool shouldBeBold = false)
{
var tf = sec.AddTextFrame();
var p = tf.AddParagraph(text);
p.Format.Font.Name = "Tahoma";
if (shouldBeBold) p.Format.Font.Bold = true;
}
I get the output like this
I have tried to encode the text and make it a unicode string using this code
private string getEscapedString(string text)
{
if (true || HasArabicCharacters(text))
{
string uString = "";
byte[] utfBytes = Encoding.Unicode.GetBytes(text);
foreach (var u in utfBytes)
{
if (u != 0)
{
uString += String.Format(#"\u{0:x4}", u);
}
}
return uString;
}
else
return text;
}
and get the returned string into a paragraph and save the PDF documents with unicode parameter set to true
But it is all the same.
I can not figure out how to get it done.
The reports were done using MigraDoc 1.50.5147 library.
The problem is Arabic language font have 4 different shap in begging,last,connected and alone, where Pdfsharp and MigraDoc can not recognize which shap to print farther more you need to reverse the character order to solve this you can use AraibcPdfUnicodeGlyphsResharper to help do such work as following:
using PdfSharp.Drawing;
using PdfSharp.Pdf;
using AraibcPdfUnicodeGlyphsResharper;
namespace MigraDocArabic
{
internal class PrintArabicUsingPdfSharp
{
public PrintArabicUsingPdfSharp(string path)
{
PdfDocument document = new PdfDocument();
document.Info.Title = "Created with PDFsharp";
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
// Create an empty page
PdfPage page = document.AddPage();
// Get an XGraphics object for drawing
XGraphics gfx = XGraphics.FromPdfPage(page);
// Create a font
XFont font = new XFont("Arial", 20, XFontStyle.BoldItalic);
var xArabicString = "كتابة اللغة العربية شيئ جميل".ArabicWithFontGlyphsToPfd();
// Draw the text
gfx.DrawString("Hello, World!", font, XBrushes.Black, new XRect(0, 0, page.Width, page.Height), XStringFormats.Center);
gfx.DrawString(xArabicString, font, XBrushes.Black, new XRect(50, 50, page.Width, page.Height), XStringFormats.Center);
// Save the document...
document.Save(path);
}
}
}
Do not Forget the Extension method
By the way this is work with iText7 too
see the image for result
Result
PDFsharp does not support RTL languages yet:
http://www.pdfsharp.net/wiki/PDFsharpFAQ.ashx#Does_PDFsharp_support_for_Arabic_Hebrew_CJK_Chinese_Japanese_Korean_6
You can work around this limitation by reversing the string.
PDFsharp does not support font ligatures yet. You are probably able to work around this limitation by replacing letters with the correct glyph (start, middle, end) depending on the position.
I have 3 PDFs, I want to add the pages from them to an output PDF file. What I’m trying to do is: import first PDF -> create new PDF document -> add pages -> draw in certain page and finally I want to add the pages from that document to the main PDF document that will be exported. Proceed to do the same with the second PDF file if needed.
ERROR: A PDF document must be opened with PdfDocumentOpenMode.Import to import pages from it.
From the main class I call the method that processes the PDF:
Pdftest pdftest = new Pdftest();
PdfDocument pdf = PdfReader.Open(#"C:\Users\pdf_file.pdf", PdfDocumentOpenMode.Import);
pdftest.CreatePages(pdf_file);
Pdftest class:
public class Pdftest
{
PdfDocument PDFNewDoc = new PdfDocument();
XFont fontChico = new XFont("Verdana", 8, XFontStyle.Bold);
XFont fontGrande = new XFont("Verdana", 12, XFontStyle.Bold);
XBrush fontBrush = XBrushes.Black;
public void CreatePages(PdfDocument archivoPdf)
{
PdfDocument NuevoDoc = new PdfDocument();
for (int Pg = 0; Pg < archivoPdf.Pages.Count; Pg++)
{
PdfPage pp = NuevoDoc.AddPage(archivoPdf.Pages[Pg]);
}
XGraphics Graficador = XGraphics.FromPdfPage(NuevoDoc.Pages[0]);
XPoint coordinate = new XPoint();
coordinate.X = XUnit.FromInch(1.4);
coordinate.Y = XUnit.FromInch(1.8);
graficador.DrawString("TEST", fontChico, fontBrush, coordinates);
for (int Pg = 0; Pg < NuevoDoc.Pages.Count; Pg++)
{
PdfPage pp = PDFNewDoc.AddPage(NuevoDoc.Pages[Pg]); //Error mentioned.
}
}
}
The error message relates to NuevoDoc. You have to save NuevoDoc in a MemoryStream and re-open it in Import mode to get your code going.
I do not understand why you try to copy pages from NuevoDoc to PDFNewDoc- so most likely you can avoid the MemoryStream when optimising the code.
I am trying to add two pages in one document. These two pages are generated from HTML.
Info : HTML Renderer for PDF using PDFsharp, HtmlRenderer.PdfSharp 1.5.0.6
var config = new PdfGenerateConfig
{
PageOrientation = PageOrientation.Portrait,
PageSize = PageSize.A4,
MarginBottom = 0,
MarginLeft = 0,
MarginRight = 0,
MarginTop = 0
};
string pdfFirstPage = CreateHtml();
string pdfsecondPage = CreateHtml2();
PdfDocument doc=new PdfDocument();
doc.AddPage(new PdfPage(PdfGenerator.GeneratePdf(pdfFirstPage, config)));
doc.AddPage(new PdfPage(PdfGenerator.GeneratePdf(pdfsecondPage, config)));
I tried few ways, but the most given error is Import Mode. This is the last test, but it is not successful .How can I combine two pages generated from HTML strings as 2 pages in 1 document and download it?
Here is code that works:
static void Main(string[] args)
{
PdfDocument pdf1 = PdfGenerator.GeneratePdf("<p><h1>Hello World</h1>This is html rendered text #1</p>", PageSize.A4);
PdfDocument pdf2 = PdfGenerator.GeneratePdf("<p><h1>Hello World</h1>This is html rendered text #2</p>", PageSize.A4);
PdfDocument pdf1ForImport = ImportPdfDocument(pdf1);
PdfDocument pdf2ForImport = ImportPdfDocument(pdf2);
var combinedPdf = new PdfDocument();
combinedPdf.Pages.Add(pdf1ForImport.Pages[0]);
combinedPdf.Pages.Add(pdf2ForImport.Pages[0]);
combinedPdf.Save("document.pdf");
}
private static PdfDocument ImportPdfDocument(PdfDocument pdf1)
{
using (var stream = new MemoryStream())
{
pdf1.Save(stream, false);
stream.Position = 0;
var result = PdfReader.Open(stream, PdfDocumentOpenMode.Import);
return result;
}
}
I save the PDF document to a MemoryStream and open them for import. This allows to add the pages to a new PdfDocument. Only the first page of the documents is used for simplicity - add loops as needed.
I have a project where HTML code is converted to a PDF using HTML Renderer. The HTML code contains a single table. The PDF is displayed but the issue is that the contents of the table are cut off at the end. So is there any solution to the problem?
PdfDocument pdf=new PdfDocument();
var config = new PdfGenerateConfig()
{
MarginBottom = 20,
MarginLeft = 20,
MarginRight = 20,
MarginTop = 20,
};
//config.PageOrientation = PageOrientation.Landscape;
config.ManualPageSize = new PdfSharp.Drawing.XSize(1080, 828);
pdf = PdfGenerator.GeneratePdf(html, config);
byte[] fileContents = null;
using (MemoryStream stream = new MemoryStream())
{
pdf.Save(stream, true);
fileContents = stream.ToArray();
return new FileStreamResult(new MemoryStream(fileContents.ToArray()), "application/pdf");
}
HTMLRenderer should be able break the table to the next page.
See also:
https://github.com/ArthurHub/HTML-Renderer/pull/41
Make sure you are using the latest version. You may have to add those CSS properties.
Also see this answer:
https://stackoverflow.com/a/37833107/162529
As far as I know page breaks are not supported, but I've done a bit of a work-around (which may not work for all cases) by splitting the HTML into separate pages using a page break class, then adding each page to the pdf.
See example code below:
//This will only work on page break elements that are direct children of the body element.
//Each page's content must be inside the pagebreak element
private static PdfDocument SplitHtmlIntoPagedPdf(string html, string pageBreakBeforeClass, PdfGenerateConfig config, PdfDocument pdf)
{
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlBodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");
var tempHtml = string.Empty;
foreach (var bodyNode in htmlBodyNode.ChildNodes)
{
if (bodyNode.Attributes["class"]?.Value == pageBreakBeforeClass)
{
if (!string.IsNullOrWhiteSpace(tempHtml))
{
//add any content found before the page break
AddPageToPdf(htmlDoc,tempHtml,config,ref pdf);
tempHtml = string.Empty;
}
AddPageToPdf(htmlDoc,bodyNode.OuterHtml,config,ref pdf);
}
else
{
tempHtml += bodyNode.OuterHtml;
}
}
if (!string.IsNullOrWhiteSpace(tempHtml))
{
//add any content found after the last page break
AddPageToPdf(htmlDoc, tempHtml, config, ref pdf);
}
return pdf;
}
private static void AddPageToPdf(HtmlDocument htmlDoc, string html, PdfGenerateConfig config, ref PdfDocument pdf)
{
var tempDoc = new HtmlDocument();
tempDoc.LoadHtml(htmlDoc.DocumentNode.OuterHtml);
var docNode = tempDoc.DocumentNode;
docNode.SelectSingleNode("//body").InnerHtml = html;
var nodeDoc = PdfGenerator.GeneratePdf(docNode.OuterHtml, config);
using (var tempMemoryStream = new MemoryStream())
{
nodeDoc.Save(tempMemoryStream, false);
var openedDoc = PdfReader.Open(tempMemoryStream, PdfDocumentOpenMode.Import);
foreach (PdfPage page in openedDoc.Pages)
{
pdf.AddPage(page);
}
}
}
Then call the code as follows:
var pdf = new PdfDocument();
var config = new PdfGenerateConfig()
{
MarginLeft = 5,
MarginRight = 5,
PageOrientation = PageOrientation.Portrait,
PageSize = PageSize.A4
};
if (!string.IsNullOrWhiteSpace(pageBreakBeforeClass))
{
pdf = SplitHtmlIntoPagedPdf(html, pageBreakBeforeClass, config, pdf);
}
else
{
pdf = PdfGenerator.GeneratePdf(html, config);
}
For any html that you want to have in its own page, just put the html inside a div with a class of "pagebreak" (or whatever you want to call it). If you want to, you could add that class to your css and give it "page-break-before: always;", so that the html will be print-friendly.
I've just figured out how to make it work, rather than page-break-inside on a TD, do that on the TABLE. Here's the code:
table { page-break-inside: avoid; }
I'm currently on the following versions (not working on stable versions at the moment):
HtmlRenderer on v1.5.1-beta1
PDFsharp on v1.51.5185-beta