I'd like to convert several different web pages into one PDF document. I found Pechkin / TuesPechkin, which has been a wonderful discovery, but I am running into one problem: only the last Object gets converted, and all the other PDF pages are blank. What could be causing this problem?
var document = new HtmlToPdfDocument
{
GlobalSettings =
{
Margins =
{
All = 1.375,
Unit = Unit.Centimeters
}
}
};
// Each "page" variable contains one HTML page
foreach (var page in pages)
document.Objects.Add(new ObjectSettings { HtmlText = page.Html });
// Create converter
var converter = Factory.Create();
// Convert!
var result = converter.Convert(document);
// Save
File.WriteAllBytes(path, result);
Turns out that this is a confirmed bug.
https://github.com/tuespetre/TuesPechkin/issues/23
I ended up solving the issue by generating one page at a time and merging the pages with iTextSharp.
Related
I have written a code in .Net Core that converts the html into pdf. The nuget manager tools I used for this conversion in SelectPdf.
SelectPdf.HtmlToPdf pdf = new SelectPdf.HtmlToPdf();
System.Drawing.SizeF size = new System.Drawing.SizeF(750, 750);
pdf.Options.PdfPageCustomSize = size;
pdf.Options.PdfPageSize = SelectPdf.PdfPageSize.A4;
SelectPdf.PdfDocument pdfDoc = pdf.ConvertHtmlString(html);
using var ms = new MemoryStream();
pdfDoc.Save(ms);
return ms.ToArray();
The code is working fine, currently, the size is in A4. The problem is when there is a huge amount of data in HTML the data is split into different pages example the below.
Is there any way so that all the content in the HTML remains on the same page in pdf?
You can try AutoFit options;
var converter = new HtmlToPdf
{
Options =
{
// set converter options
PdfPageSize = PdfPageSize.A4,
PdfPageOrientation = PdfPageOrientation.Landscape,
AutoFitHeight = HtmlToPdfPageFitMode.AutoFit,
AutoFitWidth = HtmlToPdfPageFitMode.AutoFit,
// set css #media print
CssMediaType = HtmlToPdfCssMediaType.Print
}
};
I'm merging PDF files using GemBox.Pdf as shown here. This works great and I can easily add outlines.
I've previously done a similar thing and merged Word files with GemBox.Document as shown here.
But now my problem is that there is no TOC element in GemBox.Pdf. I want to get automatically a Table of Contents while merging multiple PDF files into one.
Am I missing something or is there really no such element for PDF?
Do I need to recreate it, if yes then how would I do that?
I can add a bookmark, but I don't know how to add a link to it.
There is no such element in PDF files, so we need to create this content ourselves.
Now one way would be to create text elements, outlines, and link annotations, position them appropriately, and set the link destinations to outlines.
However, this could be quite some work so perhaps it would be easier to just create the desired TOC element with GemBox.Document, save it as a PDF file, and then import it into the resulting PDF.
// Source data for creating TOC entries with specified text and associated PDF files.
var pdfEntries = new[]
{
new { Title = "First Document Title", Pdf = PdfDocument.Load("input1.pdf") },
new { Title = "Second Document Title", Pdf = PdfDocument.Load("input2.pdf") },
new { Title = "Third Document Title", Pdf = PdfDocument.Load("input3.pdf") },
};
/***************************************************************/
/* Create new document with TOC element using GemBox.Document. */
/***************************************************************/
// Create new document.
var tocDocument = new DocumentModel();
var section = new Section(tocDocument);
tocDocument.Sections.Add(section);
// Create and add TOC element.
var toc = new TableOfEntries(tocDocument, FieldType.TOC);
section.Blocks.Add(toc);
section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));
// Create heading style.
// By default, when updating TOC element a TOC entry is created for each paragraph that has heading style.
var heading1Style = (ParagraphStyle)tocDocument.Styles.GetOrAdd(StyleTemplateType.Heading1);
// Add heading and empty (placeholder) pages.
// The number of added placeholder pages depend on the number of pages that actual PDF file has so that TOC entries have correct page numbers.
int totalPageCount = 0;
foreach (var pdfEntry in pdfEntries)
{
section.Blocks.Add(new Paragraph(tocDocument, pdfEntry.Title) { ParagraphFormat = { Style = heading1Style } });
section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));
int currentPageCount = pdfEntry.Pdf.Pages.Count;
totalPageCount += currentPageCount;
while (--currentPageCount > 0)
section.Blocks.Add(new Paragraph(tocDocument, new SpecialCharacter(tocDocument, SpecialCharacterType.PageBreak)));
}
// Remove last extra-added empty page.
section.Blocks.RemoveAt(section.Blocks.Count - 1);
// Update TOC element and save the document as PDF stream.
toc.Update();
var pdfStream = new MemoryStream();
tocDocument.Save(pdfStream, new GemBox.Document.PdfSaveOptions());
/***************************************************************/
/* Merge PDF files into PDF with TOC element using GemBox.Pdf. */
/***************************************************************/
// Load a PDF stream using GemBox.Pdf.
var pdfDocument = PdfDocument.Load(pdfStream);
var rootDictionary = (PdfDictionary)((PdfIndirectObject)pdfDocument.GetDictionary()[PdfName.Create("Root")]).Value;
var pagesDictionary = (PdfDictionary)((PdfIndirectObject)rootDictionary[PdfName.Create("Pages")]).Value;
var kidsArray = (PdfArray)pagesDictionary[PdfName.Create("Kids")];
var pageIds = kidsArray.Cast<PdfIndirectObject>().Select(obj => obj.Id).ToArray();
// Remove empty (placeholder) pages.
while (totalPageCount-- > 0)
pdfDocument.Pages.RemoveAt(pdfDocument.Pages.Count - 1);
// Add pages from PDF files.
foreach (var pdfEntry in pdfEntries)
foreach (var page in pdfEntry.Pdf.Pages)
pdfDocument.Pages.AddClone(page);
/*****************************************************************************/
/* Update TOC links from placeholder pages to actual pages using GemBox.Pdf. */
/*****************************************************************************/
// Create a mapping from an ID of a empty (placeholder) page indirect object to an actual page indirect object.
var pageCloneMap = new Dictionary<PdfIndirectObjectIdentifier, PdfIndirectObject>();
for (int i = 0; i < kidsArray.Count; ++i)
pageCloneMap.Add(pageIds[i], (PdfIndirectObject)kidsArray[i]);
foreach (var entry in pageCloneMap)
{
// If page was updated, it means that we passed TOC pages, so break from the loop.
if (entry.Key != entry.Value.Id)
break;
// For each TOC page, get its 'Annots' entry.
// For each link annotation from the 'Annots' get the 'Dest' entry.
// Update the first item in the 'Dest' array so that it no longer points to a removed page.
if (((PdfDictionary)entry.Value.Value).TryGetValue(PdfName.Create("Annots"), out PdfBasicObject annotsObj))
foreach (PdfIndirectObject annotObj in (PdfArray)annotsObj)
if (((PdfDictionary)annotObj.Value).TryGetValue(PdfName.Create("Dest"), out PdfBasicObject destObj))
{
var destArray = (PdfArray)destObj;
destArray[0] = pageCloneMap[((PdfIndirectObject)destArray[0]).Id];
}
}
// Save resulting PDF file.
pdfDocument.Save("Result.pdf");
pdfDocument.Close();
This way you can easily customize the TOC element by using the TOC switches and styles. For more info, see the Table Of Content example from GemBox.Document.
I have generated a word file using Open Xml and I need to send it as attachment in a email with pdf format but I cannot save any physical pdf or word file on disk because I develop my application in cloud environment(CRM online).
I found only way is "Aspose Word to .Net".
http://www.aspose.com/docs/display/wordsnet/How+to++Convert+a+Document+to+a+Byte+Array But it is too expensive.
Then I found a solution is to convert word to html, then convert html to pdf. But there is a picture in my word. And I cannot resolve the issue.
The most accurate conversion from DOCX to PDF is going to be through Word. Your best option for that is setting up a server with OWAS (Office Web Apps Server) and doing your conversion through that.
You'll need to set up a WOPI endpoint on your application server and call:
/wv/WordViewer/request.pdf?WOPISrc={WopiUrl}&type=downloadpdf
OR
/wv/WordViewer/request.pdf?WOPISrc={WopiUrl}&type=printpdf
Alternatively you could try and do it using OneDrive and Word Online, but you'll need to work out the parameters Word Online uses as well as whether that's permitted within the Ts & Cs.
You can try Gnostice XtremeDocumentStudio .NET.
Converting From DOCX To PDF Using XtremeDocumentStudio .NET
http://www.gnostice.com/goto.asp?id=24900&t=convert_docx_to_pdf_using_xdoc.net
In the published article, conversion has been demonstrated to save to a physical file. You can use documentConverter.ConvertToStream method to convert a document to a Stream as shown below in the code snippet.
DocumentConverter documentConverter = new DocumentConverter();
// input can be a FilePath, Stream, list of FilePaths or list of Streams
Object input = "InputDocument.docx";
string outputFileFormat = "pdf";
ConversionMode conversionMode = ConversionMode.ConvertToSeperateFiles;
List<Stream> outputStreams = documentConverter.ConvertToStream(input, outputFileFormat, conversionMode);
Disclaimer: I work for Gnostice.
If you wanna convert bytes array, then to use Metamorphosis:
string docxPath = #"example.docx";
string pdfPath = Path.ChangeExtension(docxPath, ".pdf");
byte[] docx = File.ReadAllBytes(docxPath);
// Convert DOCX to PDF in memory
byte[] pdf = p.DocxToPdfConvertByte(docx);
if (pdf != null)
{
// Save the PDF document to a file for a viewing purpose.
File.WriteAllBytes(pdfPath, pdf);
System.Diagnostics.Process.Start(pdfPath);
}
else
{
System.Console.WriteLine("Conversion failed!");
Console.ReadLine();
}
I have recently used SautinSoft 'Document .Net' library to convert docx to pdf in my React(frontend), .NET core(micro services- backend) application. It only take 15 seconds to generate a pdf having 23 pages. This 15 seconds includes getting data from database, then merging data with docx template and then converting it to pdf. The code has deployed to azure Linux box and works fine.
https://sautinsoft.com/products/document/
Sample code
public string GeneratePDF(PDFDocumentModel document)
{
byte[] output = null;
using (var outputStream = new MemoryStream())
{
// Create single pdf.
DocumentCore singlePDF = new DocumentCore();
var documentCores = new List<DocumentCore>();
foreach (var section in document.Sections)
{
documentCores.Add(GenerateDocument(section));
}
foreach (var dc in documentCores)
{
// Create import session.
ImportSession session = new ImportSession(dc, singlePDF, StyleImportingMode.KeepSourceFormatting);
// Loop through all sections in the source document.
foreach (Section sourceSection in dc.Sections)
{
// Because we are copying a section from one document to another,
// it is required to import the Section into the destination document.
// This adjusts any document-specific references to styles, bookmarks, etc.
// Importing a element creates a copy of the original element, but the copy
// is ready to be inserted into the destination document.
Section importedSection = singlePDF.Import<Section>(sourceSection, true, session);
// First section start from new page.
if (dc.Sections.IndexOf(sourceSection) == 0)
importedSection.PageSetup.SectionStart = SectionStart.NewPage;
// Now the new section can be appended to the destination document.
singlePDF.Sections.Add(importedSection);
//Paging
HeaderFooter footer = new HeaderFooter(singlePDF, HeaderFooterType.FooterDefault);
// Create a new paragraph to insert a page numbering.
// So that, our page numbering looks as: Page N of M.
Paragraph par = new Paragraph(singlePDF);
par.ParagraphFormat.Alignment = HorizontalAlignment.Center;
CharacterFormat cf = new CharacterFormat() { FontName = "Consolas", Size = 11.0 };
par.Content.Start.Insert("Page ", cf.Clone());
// Page numbering is a Field.
Field fPage = new Field(singlePDF, FieldType.Page);
fPage.CharacterFormat = cf.Clone();
par.Content.End.Insert(fPage.Content);
par.Content.End.Insert(" of ", cf.Clone());
Field fPages = new Field(singlePDF, FieldType.NumPages);
fPages.CharacterFormat = cf.Clone();
par.Content.End.Insert(fPages.Content);
footer.Blocks.Add(par);
importedSection.HeadersFooters.Add(footer);
}
}
var pdfOptions = new PdfSaveOptions();
pdfOptions.Compression = false;
pdfOptions.EmbedAllFonts = false;
pdfOptions.EmbeddedImagesFormat = PdfSaveOptions.EmbImagesFormat.Png;
pdfOptions.EmbeddedJpegQuality = 100;
//dont allow editing after population, also ensures content can be printed.
pdfOptions.PreserveFormFields = false;
pdfOptions.PreserveContentControls = false;
if (!string.IsNullOrEmpty(document.PdfProperties.Title))
{
singlePDF.Document.Properties.BuiltIn[BuiltInDocumentProperty.Title] = document.PdfProperties.Title;
}
if (!string.IsNullOrEmpty(document.PdfProperties.Author))
{
singlePDF.Document.Properties.BuiltIn[BuiltInDocumentProperty.Author] = document.PdfProperties.Author;
}
if (!string.IsNullOrEmpty(document.PdfProperties.Subject))
{
singlePDF.Document.Properties.BuiltIn[BuiltInDocumentProperty.Subject] = document.PdfProperties.Subject;
}
singlePDF.Save(outputStream, pdfOptions);
output = outputStream.ToArray();
}
return Convert.ToBase64String(output);
}
I have been trying to get my MVC application te create pdf files based on MVC Views. I got this working with plain html. But i would also like to iclude my css files that i use for the browser. Now some of them work but with one i get the following error:
An exception of type 'System.FormatException' occurred in mscorlib.dll but was not handled in user code
Additional information: Input string was not in a correct format.
I am using the following code:
var data = GetHtml(new IndexModel(Context), "~\\Views\\Home\\Index.cshtml", "");
using (var document = new iTextSharp.text.Document())
{
//define output control HTML
var memStream = new MemoryStream();
TextReader xmlString = new StringReader(data);
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream("c:\\tmp\\my.pdf", FileMode.OpenOrCreate));
//open doc
document.Open();
// register all fonts in current computer
FontFactory.RegisterDirectories();
// Set factories
var htmlContext = new HtmlPipelineContext(null);
htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
// Set css
ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(false);
cssResolver.AddCssFile(HttpContext.Server.MapPath("~/Content/elements.css"), true);
cssResolver.AddCssFile(HttpContext.Server.MapPath("~/Content/style.css"), true);
cssResolver.AddCssFile(HttpContext.Server.MapPath("~/Content/jquery-ui.css"), true);
// Export
IPipeline pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(document, writer)));
var worker = new XMLWorker(pipeline, true);
var xmlParse = new XMLParser(true, worker);
xmlParse.Parse(xmlString);
xmlParse.Flush();
document.Close();
}
the string "data" is correct and has no issues, the problem lies with the AddCssFile().
If i create the pdf without and css files everything works, but including the css files triggers the error.
Help will be very much appreciated.
I don't know the exact answer, but by looking at the error you are getting back, I would try two different approaches.
Move the
cssResolver.AddCssFile(HttpContext.Server.MapPath("~/Content/elements.css"), true);
To something like
var cssPath = HttpContext.Server.MapPath("~/Content/elements.css"), true);
cssResolver.AddCssFile(cssPath);
Then set a breakpoint and look at the values being returned for cssPath. Make sure they are accurate and do not contain any odd characters.
Second approach... If all else fails, try giving an absolute URL to the CSS resource such as http://yourdomain.com/cssPath instead of a file system path.
If either of those two appraoches help you, then you can use it to determine the actual problem and then refactor it to your hearts content after that.
UPDATE ------------------------------------------------------------------>
According to the documentation, you need an absolute URL for the file, so Server.MapPath won't work.
addCssFile
void addCssFile(String href,
boolean isPersistent)
throws CssResolverException
Add a
Parameters:
href - the link to the css file ( an absolute uri )
isPersistent - true if the added css should not be deleted on a call to clear
Throws:
CssResolverException - thrown if something goes wrong
In that case, I would try using something like :
public string AbsoluteContent(string contentPath)
{
var path = Url.Content(contentPath);
var url = new Uri(HttpContext.Current.Request.Url, path);
return url.AbsoluteUri;
}
and use it like such :
var cssPath = AbsoluteContent("~/Content/embeddedCss/yourcssfile.css");
I am porting an existing app from Java to C#. The original app used the IText library to fill PDF form templates and save them as new PDF's. My C# code (example) below:
string templateFilename = #"C:\Templates\test.pdf";
string outputFilename = #"C:\Output\demo.pdf";
using (var existingFileStream = new FileStream(templateFilename, FileMode.Open))
{
using (var newFileStream = new FileStream(outputFilename, FileMode.Create))
{
var pdfReader = new PdfReader(existingFileStream);
var stamper = new PdfStamper(pdfReader, newFileStream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
foreach (string fieldKey in fieldKeys)
{
form.SetField(fieldKey, "REPLACED!");
}
stamper.FormFlattening = true;
stamper.Close();
pdfReader.Close();
}
}
All works well only if I ommit the
stamper.FormFlattening = true;
line, but then the forms are visible as...forms.
When I add the this line, any values set to the form fields are lost, resulting in a blank form. I would really appreciate any advice.
Most likely you can resolve this when using iTextSharp 5.4.4 (or later) by forcing iTextSharp to generate appearances for the form fields. In your example code:
var form = stamper.AcroFields;
form.GenerateAppearances = true;
Resolved the issue by using a previous version of ITextSharp (5.4.3). Not sure what the cause is though...
I found a working solution for this for any och the newer iTextSharp.
The way we do it was:
1- Create a copy of the pdf temmplate.
2- populate the copy with data.
3- FormFlatten = true and setFullCompression
4- Combine some of the PDFs to a new document.
5- Move the new combined document and then remove the temp.
This way we got the issue with removed input and if we skipped the "formflatten" it looked ok.
However when we moved the "FormFlatten = true" from step 3 and added it as a seperate step after the moving etc was complete, it worked perfectly.
Hope I explained somewhat ok :)
In your PDF File, change the property to Visible, the Default value is Visible but not printable.