Convert html to pdf and merge it with existing pdfs

Convert html to pdf and merge it with existing pdfs - c#

I have a System.Net.Mail.MailMessage which shall have it's html body and pdf attachments converted into one single pdf.
Converting the html body to pdf works for me with this answer
Converting the pdf attachments into one pdf works for me with this answer
However after ~10 hours of trying I can not come up with a combined solution which does both. All I'm getting are NullReferenceExceptions somewhere in IText source, "the document is not open", etc...
For example, this will throw no error but the resulting pdf will only contain the attachments but not the html email body:
Document document = new Document();
StringReader sr = new StringReader(mail.Body);
HTMLWorker htmlparser = new HTMLWorker(document);
using (FileStream fs = new FileStream(targetPath, FileMode.Create))
{
PdfCopy writer = new PdfCopy(document, fs);
document.Open();
htmlparser.Parse(sr);
foreach (string fileName in pdfList)
{
PdfReader reader = new PdfReader(fileName);
reader.ConsolidateNamedDestinations();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
PRAcroForm form = reader.AcroForm;
if (form != null)
{
writer.CopyAcroForm(reader);
}
reader.Close();
}
writer.Close();
document.Close();
}
I'm using the LGPL licensed ITextSharp 4.1.6

From v4.1.6 fanboy to v4.1.6 fanboy :D
Looks like the HTMLWorker is closing the documents stream right after parsing. So as a workaround, you could create a pdf from your mailbody in memory. And then add this one together with the attachment to your final pdf.
Here is some code, that should do the trick:
StringReader htmlStringReader = new StringReader("<html><body>Hello World!!!!!!</body></html>");
byte[] htmlResult;
using (MemoryStream htmlStream = new MemoryStream())
{
Document htmlDoc = new Document();
PdfWriter htmlWriter = PdfWriter.GetInstance(htmlDoc, htmlStream);
htmlDoc.Open();
HTMLWorker htmlWorker = new HTMLWorker(htmlDoc);
htmlWorker.Parse(htmlStringReader);
htmlDoc.Close();
htmlResult = htmlStream.ToArray();
}
byte[] pdfResult;
using (MemoryStream pdfStream = new MemoryStream())
{
Document doc = new Document();
PdfCopy copyWriter = new PdfCopy(doc, pdfStream);
doc.Open();
PdfReader htmlPdfReader = new PdfReader(htmlResult);
AppendPdf(copyWriter, htmlPdfReader); // your foreach pdf code here
htmlPdfReader.Close();
PdfReader attachmentReader = new PdfReader("C:\\temp\\test.pdf");
AppendPdf(copyWriter, attachmentReader);
attachmentReader.Close();
doc.Close();
pdfResult = pdfStream.ToArray();
}
using (FileStream fs = new FileStream("C:\\temp\\test2.pdf", FileMode.Create, FileAccess.Write))
{
fs.Write(pdfResult, 0, pdfResult.Length);
}
private void AppendPdf(PdfCopy writer, PdfReader reader)
{
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
}
Ofc you could directly use a FileStream for the final document instead of a MemoryStream as well.

Related

how read pdf layers and add to another page using itextsharp?

How can I read PDF layers and add them to another page using iTextSharp?
I want to copy layers from one PDF page and move them to another PDF page in C#.
I've tried reading from a PDF layer from one page, but I am unable to copy that layer.
var document = new Document();
FileStream outfile = new FileStream(outPutFilePath, FileMode.Create);
var writer = new PdfCopy(document, outfile);
document.Open();
foreach (var fileName in filesPath)
{
var reader = new PdfReader(fileName);
PdfStamper stamper = new PdfStamper(reader, outfile);
Dictionary<String, PdfLayer> layers = stamper.GetPdfLayers();
//PdfLayer layer = layers.get("Nested layer 1");
//layer.setOn(false);
for (var i = 1; i <= reader.NumberOfPages; i++)
{
var page = writer.GetImportedPage(reader, i);
page.ContentTagged = true;
writer.AddPage(page);
}
stamper.Close();
reader.Close();
}
writer.Close();
document.Close();

MemoryStream PDF file in ASP.NET Core 2

Keep getting an error 'Cannot access a closed Stream.'
Does the file need to be physically saved on a server first, prior to being Streamed? I am doing the same with Excel file and it works just fine. Trying to use the same principle here with PDF file.
[HttpGet("ExportPdf/{StockId}")]
public IActionResult ExportPdf(int StockId)
{
string fileName = "test.pdf";
MemoryStream memoryStream = new MemoryStream();
// Create PDF document
Document document = new Document(PageSize.A4, 25, 25, 25, 25);
PdfWriter pdfWriter = PdfWriter.GetInstance(document, memoryStream);
document.Open();
document.Add(new Paragraph("Hello World"));
document.Close();
memoryStream.Position = 0;
return File(memoryStream, "application/pdf", fileName);
}

Are you using iText7? If yes, please add that tag onto your question.
using (var output = new MemoryStream())
{
using (var pdfWriter = new PdfWriter(output))
{
// You need to set this to false to prevent the stream
// from being closed.
pdfWriter.SetCloseStream(false);
using (var pdfDocument = new PdfDocument(pdfWriter))
{
...
}
var renderedBuffer = new byte[output.Position];
output.Position = 0;
output.Read(renderedBuffer, 0, renderedBuffer.Length);
...
}
}

Switched to iText& (7.1.1) version as per David Liang comment
Here is the code that worked for me, a complete method:
[HttpGet]
public IActionResult Pdf()
{
MemoryStream memoryStream = new MemoryStream();
PdfWriter pdfWriter = new PdfWriter(memoryStream);
PdfDocument pdfDocument = new PdfDocument(pdfWriter);
Document document = new Document(pdfDocument);
document.Add(new Paragraph("Welcome"));
document.Close();
byte[] file = memoryStream.ToArray();
MemoryStream ms = new MemoryStream();
ms.Write(file, 0, file.Length);
ms.Position = 0;
return File(fileStream: ms, contentType: "application/pdf", fileDownloadName: "test_file_name" + ".pdf");
}

Not able to get Unicode support for different languages

public PDFFormFillerResult CreateHtmlToPdfMemoryStream(string sHtmlText)
{
//Create a stream that we can write to, in this case a MemoryStream
using (var ms = new MemoryStream())
{
//Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
var doc = new iTextSharp.text.Document();
//Create a writer that's bound to our PDF abstraction and our stream
var writer = PdfWriter.GetInstance(doc, ms);
//Open the document for writing
doc.Open();
var example_html = sHtmlText;
//Create a new HTMLWorker bound to our document
var htmlWorker = new iTextSharp.text.html.simpleparser.HTMLWorker(doc);
FontFactory.Register(Path.Combine(Environment.CurrentDirectory + "\\Library\\arial.ttf"), "Arial"); // just give a path of arial.ttf
StyleSheet css = new StyleSheet();
css.LoadTagStyle("body", "face", "Garamond");
css.LoadTagStyle("body", "encoding", "Identity-H");
css.LoadTagStyle("body", "size", "12pt");
htmlWorker.SetStyleSheet(css);
// htmlWorker.StartDocument();
//HTMLWorker doesn't read a string directly but instead needs a TextReader (which StringReader subclasses)
using (var sr = new StringReader(example_html)) // using (TextReader sr = new StreamReader(example_html, Encoding.UTF8))
{
//Parse the HTML
htmlWorker.Parse(sr);
}
doc.Close();
return new PDFFormFillerResult(ms, PDFFormFillerResultType.Success, string.Empty);
}
}
I need the output as ByteStream as I am merging multiple documents to create one pdf at the end. This works well for English, but when I try using other languages like Russian, Korean, Combodian, etc only the english text which are in template gets displayed.

Copying PDF without loose form field structure with ItextSharp

I would like to get a pdf, keep somes pages, then save it to another destination without losing fieldstructure.
Here the code perfectly working for copying:
string sourceFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string sourceFile = Path.Combine(sourceFolder, "POMultiple.pdf");
string fileName = #"C:\Users\MyUser\Desktop\POMultiple.pdf";
byte[] file = System.IO.File.ReadAllBytes(fileName);
public static void removePagesFromPdf(byte[] sourceFile, String destinationFile, params int[] pagesToKeep)
{
//Used to pull individual pages from our source
PdfReader r = new PdfReader(sourceFile);
//Create our destination file
using (FileStream fs = new FileStream(destinationFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document())
{
PdfWriter writer = PdfWriter.GetInstance(doc, fs);
//Open the desitination for writing
doc.Open();
//Loop through each page that we want to keep
foreach (int page in pagesToKeep)
{
//Add a new blank page to destination document
doc.NewPage();
//Extract the given page from our reader and add it directly to the destination PDF
writer.DirectContent.AddTemplate(writer.GetImportedPage(r, page), 0, 0);
}
//Close our document
doc.Close();
}
}
}
But when I open "TestOutput.pdf" file in acrobat reader all my fields are empty.
Any Help ?

You need something like this:
PdfReader reader = new PdfReader(sourceFile);
reader.SelectPages(2-4,8-9);
PdfStamper stp = new PdfStamper(reader, new FileStream(destinationFile, FileMode.Create));
stp.Close();
reader.Close();

convert a pdfreader to a pdfdocument

Can anyone tell me how to convert a PdfReader object into a PdfDocument ?
I have read a disk file and converted to a memorystream but I need it as a PdfDocument for other methods in my C# program.
I'm converting an application to use iTextSharp instead of PdfSharp.
MemoryStream pdfstream = new MemoryStream();
/* Convert the attachment to an byte array */
byte[] pdfarray = (byte[])dr["Data"];
/* Write the attachment into the memory */
pdfstream.Write(pdfarray, 0, pdfarray.Length);
/* Set the memorystream to the beginning */
pdfstream.Seek(0, System.IO.SeekOrigin.Begin);
/* Open the pdf document */
PdfSharp.Pdf.PdfDocument document = PdfSharp.Pdf.IO.PdfReader.Open(pdfstream, PdfDocumentOpenMode.Modify);
//iTextSharp.text.Document doc1 = iTextSharp.text.pdf.PdfReader.GetStreamBytes(
//ITS.pdf.PdfReader rdr = ITS.pdf.PdfReader(
string filename = DateTime.Now.Ticks.ToString() + "_" + dr["AttachmentName"].ToString();
string path = Path.Combine(FolderName, filename);
document.Save(path);

I think you can do something like this (note code not run or tested, might need a tweak):
using (MemoryStream ms = new MemoryStream())
{
Document doc = new Document(PageSize.A4, 50, 50, 15, 15);
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
using (var rdr = new PdfReader(filePath))
{
PdfImportedPage page;
for(int i = 1; i <= rdr.PageCount; i++)
{
page = writer.GetImportedPage(templateReader, i)
writer.DirectContent.AddTemplate(page, 0, 0);
doc.NewPage();
}
}
}
This will read in the PDF page by page and output it to your document.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Convert html to pdf and merge it with existing pdfs - c#

Related

how read pdf layers and add to another page using itextsharp?

MemoryStream PDF file in ASP.NET Core 2

Not able to get Unicode support for different languages

Copying PDF without loose form field structure with ItextSharp

convert a pdfreader to a pdfdocument

Categories

Resources