XMLWorkerHelper convert html page into pdf produces only first 2 pages

XMLWorkerHelper convert html page into pdf produces only first 2 pages - c#

I'm getting only the first two page. I have a generated list of elements in the third page. When there are too many elements in my collection, all pages from there become blank in my pdf output
using (FileStream fs = new FileStream(filePath, FileMode.Create))
{
Document document = new Document(PageSize.A4, 25, 25, 30, 30);
WebClient wc = new WebClient();
string htmlText = wc.DownloadString(textUrl);
PdfWriter pdfWriter = PdfWriter.GetInstance(document, fs);
document.Open();
// register all fonts in current computer
FontFactory.RegisterDirectories();
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider();
using (var msHtml = new MemoryStream(System.Text.Encoding.Default.GetBytes(htmlText)))
{
//Set factories
var cssAppliers = new CssAppliersImpl(fontProvider);
var htmlContext = new HtmlPipelineContext(cssAppliers);
//HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
//FontFactory.Register(arialuniTff);
string gishaTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "GISHA.TTF");
FontFactory.Register(gishaTff);
var worker = XMLWorkerHelper.GetInstance();
var cssStream = new FileStream(FolderMapPath("/css/style.css"), FileMode.Open);
worker.ParseXHtml(pdfWriter, document, msHtml, cssStream, new UnicodeFontFactory());
}
// Close the document
document.Close();
// Close the writer instance
pdfWriter.Close();
}
Here is my cshtml code

I only have experience working with iText in Java, but is it possible that the MemoryStream object you are using has a byte limit that gets filled up when the table on page 3 has too many elements for it to store? If that's the case, then the closing tags on that long table may not be written to the MemoryStream, thus that table and everything after doesn't get rendered; i.e. get's truncated by the PDF converter engine.
Can you try using a different Stream object?

Here is how i solved my problem. The problem wasn't with the C# backend code. It's seems like the XMLWorkerHelper at the moment doesn't deal well with loop in view. I had to display list of items in my PDF file. The result was well if the collection contains no so many items, But the page break at the level when the the collection contains more than 50 items because this could not be display in a single page. What i did is that i started to count the number of items and at some number like 40, i just include a break element <li style="list-style:none; list-style-type:none; page-break-before:always">#item</li>. And reset the counter and continue display my items. And it was great and my problem was resolved.
May be this can be helpful for someone.

Related

Setting my paragraphs to begin at the top margin with iTextSharp

I'm trying to build pdfs to digitize our reporting system at my company. I've used iTextSharp and so far it looks great but my margins don't seem to be working properly. I've set the margins to a config file and the left and right margins are working great, but my paragraph seems to start about 30% down the page regardless of the top and bottom margin. Here's the code I'm using:
public int PrintPdf()
{
//Getting the path
//Path.GetFileNameWithoutExtension("Test_Doc_Print") + ".pdf");
object OutputFileName = this._path;
//Making the PDF Doc
iTextSharp.text.Document PDFReport = new iTextSharp.text.Document
(
PageSize.A4.Rotate(),
/*this._left,
this._right,
this._top,
this._bottom*/
10,
10,
10,
10
);
//Setting The Font
string fontpath = #"C:\Windows\Fonts\";
BaseFont monoFont = BaseFont.CreateFont(fontpath + "Consola.ttf", BaseFont.CP1252, BaseFont.EMBEDDED);
Font fontPDF = new Font(monoFont, this._fontSize);
// create file stream for writing the PDF
FileStream fs = new FileStream(this._path, FileMode.Create, FileAccess.ReadWrite);
//FileStream fs = new FileStream(#"c:\\Reportlocation", FileMode.Create);
// Create an FCFC scan object to convert TextToPrint page at a time
FCFCScanner page = new FCFCScanner(this._text);
// Create PDF writer and associate with file stream
//iTextSharp.text.pdf.PdfWriter writer = new iTextSharp.text.pdf.PdfWriter.GetInstance(PDFReport, fs);
PdfWriter writer = PdfWriter.GetInstance(PDFReport, fs);
//Opening the PDF Doc
PDFReport.Open();
//Load each page from the string into the PDF
page.NextPage();
do
{
if (page.PageLength > 0)
{
Paragraph prg = new Paragraph(page.Page, fontPDF);
PDFReport.Add(prg);
PDFReport.NewPage();
page.NextPage();
}
} while (page.MorePages);
PDFReport.Close();
return (0);
}
I've set the margins to 10 (hardcoded for now) to show what I'm working with. This program should read a string that I send it from my StringBuider class.
It's designed to receive one page of text at a time and to convert it into a PDF document.
Ok, so the problem: when the page is built, the first paragraph doesn't begin at the top margin. If I reduce the margin, it doesn't shift the paragraph up to the new margin. It's causing my PDFs to be much longer as the text that fits easily on a printer page takes two PDF pages. Any help with getting my paragraph to simply begin at the top margin would be really appreciated.
I'm relatively new to programming and this is my first post, so if you need more information, let me know and I'll add more info.

How to generate mutiple page using PdfWriter

I am generating pdf file for payslip using PdfWriter in C#. And I'm downloading the pdf file from html code, every user it will create a table (<table>...</table>) and every table display in new page.
But all table are displayed in same page.
eg
Page 1
Employee 1 Details
may the details will come to the next page.
Page 2
Employee 2 Details
Page 3
Employee 2 details
Page 4
Employee 3 details
.....
.....
....
But Now my output will come
Page 1
Employee 1
Employee 2
Page 2
Employee 3
Employee 4
Employee 5
.....
My code is
StringBuilder stb = new StringBuilder();
stb.Append(All.ToString());
EXP.InnerHtml = stb.ToString();
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=" + filename + ".pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter stringWriter = new StringWriter();
HtmlTextWriter htmlTextWriter = new HtmlTextWriter(stringWriter);
string resHtml = "";
for(int i=0;i<10;i++)
{
resHtml+="<table width='100%'><tr><td align='center'>payslip"+ i+"</td></tr></table>";
}
StringReader stringReader = new StringReader(resHtml);
Doc = new Document(PageSize.A2, 10f, 10f, 50f, 20f);
HTMLWorker htmlparser = new HTMLWorker(Doc);
PdfWriter.GetInstance(Doc, Response.OutputStream);
Doc.Open();
htmlparser.Open();
htmlparser.Parse(stringReader);
htmlparser.Close();
Doc.Close();
Response.Write(Doc);
Response.End();

You are using HTMLWorker. That class is deprecated: it is no longer supported as it has been abandoned in favor of XML Worker. There are different ways to solve your problem.
Create multiple small HTML files instead of one big HTML
I wouldn't create one long table for every employee, but a single table for every employee, and introduce document.NewPage() after adding every table.
See Answer #2 to the question How to parse multiple HTML files into a single PDF?
This is some Java code (you can read it as if it were pseudo code):
public void createPdf(Employees employees) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(file));
document.open();
String css = readCSS();
for (Employee employee : employees) {
String html = createHtml(employee);
ElementList list = XMLWorkerHelper.parseToElementList(html, css);
for (Element e : list) {
document.add(e);
}
document.newPage();
}
document.close();
}
This solution is the best solution from the point of view of memory and CPU use.
Create one big HTML and introduce page breaks
Another option is to introduce a page break before every employee table. See set new page in HTML using iTextSharp HTMLWorker (html to pdf)
This isn't a good idea as you build up a large chunk of data in memory and that memory can only be released after the PDF is rendered. iTextSharp tries to flush pages to the OutputStream as soon as possible. If you create small HTML files, and add them to the PDF immediately, you can discard the HTML bytes from memory sooner rather than later and iTextSharp will also be able to flush content streams to the output, releasing memory that is needed to store that content.
Important notice:
Obviously, these answers imply that you do the right thing. That is: throw away your code that relies on the abandoned HTMLWorker and start using XML Worker.

You can append pagebreak after every tag and append before tag.
This will give you a string like,
.................
following is the code to split the html string.
Dim myString As String = sb.ToString()
Dim mySplit As String = "pagebreak"
Dim myResult() As String = myString.Split(New String() {mySplit}, StringSplitOptions.None)
To render each html string on new page,
Dim pdfDoc As New Document(PageSize.A4, 10.0F, 10.0F, 10.0F, 0.0F)
Dim htmlparser As New HTMLWorker(pdfDoc)
Using memoryStream As New MemoryStream()
Dim writer As PdfWriter = PdfWriter.GetInstance(pdfDoc, memoryStream)
pdfDoc.Open()
For Each r As String In myResult
Dim sr As New StringReader(r)
htmlparser.Parse(sr)
pdfDoc.NewPage()
sr.Dispose()
Next
pdfDoc.Close()
Dim bytes As Byte() = memoryStream.ToArray()
memoryStream.Close()
Response.Clear()
Response.ContentType = "application/pdf"
Response.AddHeader("Content-Disposition", "attachment;filename=Report.pdf")
Response.Buffer = True
Response.Cache.SetCacheability(HttpCacheability.NoCache)
Response.BinaryWrite(bytes)
Response.[End]()
Response.Close()
End Using

If your html content is fixed then you can with page break but if your HTML content is variable then it will be different to predict when page is start and finish.

PDF generated with iTextSharp always prompts to save changes when closing. And has missing pages when viewed with non-Acrobat PDF readers

I've recently used iTextSharp to create a PDF by importing the 20 pages from an existing PDF and then adding a dynamically generated link to the bottom of the last page. It works fine... kind of. Viewing the generated PDF in Acrobat Reader on a windows PC displays everything as expected although when closing the document it always asks "Do you want to save changes?". Viewing the generated PDF on a Surface Pro with PDF Reader displays the document without the first and last pages. Apparently on a mobile device using Polaris Office the first and last pages are also missing.
I'm wondering if when the new PDF is generated it's not getting closed off quite properly and that's why it asks "Do you want to save changes?" when closing it. And maybe that's also why it doesn't display correctly in some PDF reader apps.
Here's the code:
using (var reader = new PdfReader(HostingEnvironment.MapPath("~/app/pdf/OriginalDoc.pdf")))
{
using (
var fileStream =
new FileStream(
HostingEnvironment.MapPath("~/documents/attachments/DocWithLink_" + id + ".pdf"),
FileMode.Create, FileAccess.Write))
{
var document = new Document(reader.GetPageSizeWithRotation(1));
var writer = PdfWriter.GetInstance(document, fileStream);
using (PdfStamper stamper = new PdfStamper(reader, fileStream))
{
var baseFont = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, BaseFont.CP1252,
BaseFont.NOT_EMBEDDED);
Font linkFont = FontFactory.GetFont("Arial", 12, Font.UNDERLINE, BaseColor.BLUE);
document.Open();
for (var i = 1; i <= reader.NumberOfPages; i++)
{
document.NewPage();
var importedPage = writer.GetImportedPage(reader, i);
// Copy page of original document to new document.
var contentByte = writer.DirectContent;
contentByte.AddTemplate(importedPage, 0, 0);
if (i == reader.NumberOfPages) // It's the last page so add link.
{
PdfContentByte cb = stamper.GetOverContent(i);
//Create a ColumnText object
var ct = new ColumnText(cb);
//Set the rectangle to write to
ct.SetSimpleColumn(100, 30, 500, 90, 0, PdfContentByte.ALIGN_LEFT);
//Add some text and make it blue so that it looks like a hyperlink
var c = new Chunk("Click here!", linkFont);
var congrats = new Paragraph("Congratulations on reading the eBook! ");
congrats.Alignment = PdfContentByte.ALIGN_LEFT;
c.SetAnchor("http://www.domain.com/pdf/response/" + encryptedId);
//Add the chunk to the ColumnText
congrats.Add(c);
ct.AddElement(congrats);
//Tell the system to process the above commands
ct.Go();
}
}
}
}
}
I've looked at these posts with similar issues but none seem to quite provide the answer I need:
iTextSharp-generated PDFs cause save dialog when closing
Using iTextSharp to write data to PDF works great, but Acrobat Reader asks 'Do you want to save changes' when closing file
(Or they refer to memory streams instead of writing to disk etc)
My question is, how do I modify the above so that when closing the generated PDF in Acrobat Reader there's no "Do you want to save changes?" prompt. The answer to that may solve the problems with missing pages on Surface Pro etc but if you know anything else about what might be causing that I'd like to hear about it.
Any suggestions would be very welcome! Thanks!

At first glance (and without much coffee yet) it appears that you're using a PdfReader in three different contexts, as a source to a PdfStamper, as a source for Document and as for a source for importing. So you are essentially importing a document into itself that you're also writing to.
To give you a quick overview, the following code will essentially clone the contents of source.pdf into dest.pdf:
using (var reader = new PdfReader("source.pdf")){
using (var fileStream = new FileStream("dest.pdf", FileMode.Create, FileAccess.Write)){
using (PdfStamper stamper = new PdfStamper(reader, fileStream)){
}
}
}
Since that does all of the cloning for you you don't need to import pages or anything.
Then, if the only thing that you want to do is add some text to the last page, you can just use the above and ask the PdfStamper for a PdfContentByte using GetOverContent() and telling it what page number you're interested. Then you can just use the rest of your ColumnText logic.
using (var reader = new PdfReader("Source.Pdf")) {
using (var fileStream = new FileStream("Dest.Pdf"), FileMode.Create, FileAccess.Write) {
using (PdfStamper stamper = new PdfStamper(reader, fileStream)) {
//Get a PdfContentByte object
var cb = stamper.GetOverContent(reader.NumberOfPages);
//Create a ColumnText object
var ct = new ColumnText(cb);
//Set the rectangle to write to
ct.SetSimpleColumn(100, 30, 500, 90, 0, PdfContentByte.ALIGN_LEFT);
//Add some text and make it blue so that it looks like a hyperlink
var c = new Chunk("Click here!", linkFont);
var congrats = new Paragraph("Congratulations on reading the eBook! ");
congrats.Alignment = PdfContentByte.ALIGN_LEFT;
c.SetAnchor("http://www.domain.com/pdf/response/" + encryptedId);
//Add the chunk to the ColumnText
congrats.Add(c);
ct.AddElement(congrats);
//Tell the system to process the above commands
ct.Go();
}
}
}

MVC - Generating multiple PDFs

I am using the following code for generating a PDF file.
It is working good, but now i want to generate 4 PDF's at the same time.
I tried by again initiating Document & repeating the whole code for generating 2nd PDF report, But it generates only 1 PDF.
var document = new Document(PageSize.A4, 50, 50, 25, 25);
// Create a new PdfWrite object, writing the output to a MemoryStream
var output = new MemoryStream();
var writer = PdfWriter.GetInstance(document, output);
// Open the Document for writing
document.Open();
string contents = System.IO.File.ReadAllText(Server.MapPath("~/Reports/Original.html"));
var parsedHtmlElements = HTMLWorker.ParseToList(new StringReader(contents), null);
foreach (var htmlElement in parsedHtmlElements)
document.Add(htmlElement as IElement);
document.Close();
Response.ContentType = "application/pdf";
Response.AddHeader("Content-Disposition", string.Format("attachment;filename=Receipt-{0}.pdf", "Report"));
Response.BinaryWrite(output.ToArray());
return View();
How to generate multiple PDF's?

You are outputting the bytes as a response, so you would never be able of generating 2 different files in one response. Only one response per request.
If you want the user to download 2 different PDFs at the same time you could call the controller using javascript from the view.

iTextSharp - Opening a file and saving PdfDestination and PdfAction

I am trying to do something relatively simple with iTextSharp, but I always find it very confusing and can't figure out this without asking for some help.
I have a situation where a third party product I use generates a PDF, but doesn't have the option of setting initial view settings (zoom, fit to width, etc).
I have found some code that will allows me to do this in iTextSharp :-
Developer Barn
The bit I cannot work out is how to apply this to a file that already exists - this seems to be fine for any new files, or something I am creating in iTextSharp, but not an existing PDF. Is there a way of doing this, and how can it be done?
Many thanks in advance,
Adam
PS - Already found the answer to this.. StackOverflow won't let me close my own question though? Seems a bit daft, but anyway do it like this -
PdfReader reader = new PdfReader(new FileStream(fileName, FileMode.Open, FileAccess.Read));
Rectangle size = reader.GetPageSizeWithRotation(1);
using (Document document = new Document(size))
{
using (PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(Path.Combine(Path.GetDirectoryName(fileName), "Zoom" + Path.GetFileName(fileName)), FileMode.Create, FileAccess.ReadWrite)))
{
//open our document
document.Open();
PdfContentByte cb = writer.DirectContent;
//this creates a new destination to send the action to when the document is opened.
PdfDestination pdfDest = new PdfDestination(PdfDestination.FITH, reader.GetPageSize(1).Top);
//create a new action to send the document to our new destination.
PdfAction action = PdfAction.GotoLocalPage(1, pdfDest, writer);
for (int pageNumber = 1; pageNumber <= reader.NumberOfPages; pageNumber++)
{
//need to change page size for landscape / portrait
document.SetPageSize(reader.GetPageSizeWithRotation(pageNumber));
document.NewPage();
PdfImportedPage page = writer.GetImportedPage(reader, pageNumber);
cb.AddTemplate(page, 0, 0);
}
//set the page mode
int PageMode = 0;
PageMode += PdfWriter.PageLayoutOneColumn;
//set the open action for our writer object
writer.SetOpenAction(action);
writer.ViewerPreferences = PageMode;
writer.SetFullCompression();
//finally, close our document
document.Close();
}
}

I don't think there is an edit functionnality per se, neither in iTextSharp nor in iText. I think the way to go is to open the existing document, create a new writer, copy the old document into the new writer while adding the enrichments you'd like to see and overwrite the original doc afterwards as decribed here.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

XMLWorkerHelper convert html page into pdf produces only first 2 pages - c#

Related

Setting my paragraphs to begin at the top margin with iTextSharp

How to generate mutiple page using PdfWriter

PDF generated with iTextSharp always prompts to save changes when closing. And has missing pages when viewed with non-Acrobat PDF readers

MVC - Generating multiple PDFs

iTextSharp - Opening a file and saving PdfDestination and PdfAction

Categories

Resources