Page of PDF written in footer - c#

Does anyone an idea how acrobat reader knows the page I am currently viewing?
For example - I have a PDF file which contains page numbers in footer in format - 1/A or 1/1 - divided into chapters, and acrobat knows that this is number of page - it doesnt write only total number of pages from beggining to this current page.
I am writing C# apps, WinForms, where I need to get number of page - in the same way as acrobat does.
I am converting each page into text with pdfLibView lib, but then, I need look at last page and use complex algorithms to find out which string in footer is page number - it can contain any other informations.
So any idea, how to get page number in format as it is written on this page?

It seems like your document uses page labels feature defined in PDF specification.
8.3.1 Page Labels says:
In addition, a document may optionally define page labels (PDF 1.3) to
identify each page visually on the screen or in print. Page labels and
page indices need not coincide: the indices are fixed, running
consecutively through the document starting from 0 for the first page,
but the labels can be specified in any way that is appropriate for the
particular document. For example, if the document begins with 12 pages
of front matter numbered in roman numerals and the remainder of the
document is numbered in arabic, the first page would have a page index
of 0 and a page label of i, the twelfth page would have index 11 and
label xii, and the thirteenth page would have index 12 and label 1.
You might try Docotic.Pdf library if you want to access page labels information in an existing document (disclaimer: I work for the vendor of the library).
Here is a sample for how to add Page Labels to PDF document. This sample doesn't show how to access existing labels but might give some clues for a start.

Related

Differentiate between blank page VS X-Cross page in PDF file Using iText7 C#

I am splitting a PDF document into multiple PDF documents using IText7. For example, I have a PDF document that contains multiple combinations of pages.
Page 1- Page with X-Cross symbol
Page 2- Blank Page
Page 3- Page with Text
Page 4- Page with X-Cross symbol
Page 5- Blank Page
Page 6- Page with Text
Page 7- Page with X-Cross symbol
when I tried to read the text of the page details, Page 1 & 2 both are returning the text as empty.
My question is: how could I determine the blank page and page with the X-Cross symbol? Any help would be greatly appreciated.
In a comment you explained that the X-Cross symbol is actually a bitmap image. Thus, to check whether there is such a symbol on a page, you have to apply bitmap image extraction, not text extraction. There are a number of questions and answers on bitmap image extraction on stack overflow, for example this answer by Alexey Subach who is on the iText 7 development team.
If you are lucky, the blank pages really are blank (and don't contain e.g. a purely white or purely transparent bitmap image). In that case you only need to check whether or not a page has (a) any text (which you already check) and whether or not it has (b) any bitmap:
If it has neither, it is a Blank Page.
If it has only an image, it is a Page with X-Cross symbol.
If it has text, it is a Page with Text.
If things are more complicated, you'll have to look more closely, e.g. analyze the bitmaps found on a page. If all those X-Cross symbol bitmaps are identical, you may for example compare found bitmap images with an example one you extracted first.

MigraDoc: How to add a blank page after a section on odd pages?

I want to create a PDF document for double-sided printing to save paper on an report.
On odd pages after the end of an section, there should be a blank page reading "This page is intentionally left blank.", such that the report can be split by chapters as needed.
Any hints how to do this?
MigraDoc creates blank pages as needed. Just indicate in the PageSetup that you are creating a double-sided layout.
The tricky part is adding the text "This page is intentionally left blank". I think I'd add that text with PDFsharp after rendering the page with MigraDoc. But I'd rather leave the page blank, assuming the reader is smart enough to realize it was left blank to get a double-sided layout.

Print only the first page of each group

My report adds a page break and resets a custom page number whenever a group changes. How can I print only the first page of each group?
Crystal itself doesn't have a great way for you to programmatically print specific pages. But there are a few ways to make this kind of work:
Easy: Make a formula that generates a list of which pages are ones you want to print. (It might output something like 1,2,5,7,9, which you could copy and paste into the Page Range when you go to print.)
Medium: Make a modified version of this report that excludes any data you don't want to print. Figure out a maximum number of records that fit on a page (for example, 18) and only take the first 18 per group. (This might be easier to set up with a custom SQL statement in Crystal.)
Difficult: Same as option 2, but you write suppression formulas for everything on your report. If Page Number > 1, the entire page goes blank. After printing you just remove the blank pages from the file. Tedious, but no ink wasted...

Footer not appearing on pages with programmatically added content

I have a base document with sections created by programmatically inserting from a separate template document. The insertion is fine, but the footer isn't appearing on any extra pages created as a result of the insertion, i.e. the first page has a footer, but page two (created by inserting content) does not. If the original document has a two pages then it will render with a footer on the first two pages, but not the third.
Is there a way I can force the footer to render on all of the pages I have created?
There are three types of Headers/Footers in a Section e.g.
Header/Footer for First page,
Primary Header/Footer which can also be used for Odd numbered pages and
Header/Footer for Even numbered pages.
So, if you want to keep same Header/Footer across all pages in Word document, you can first clear all Headers/Footers (see Section.HeadersFooters.Clear() method) from all Sections in Document and then build/assign a single Primary Header/Footer to the first Section.
You might also want to turn off/on 'Different First Page' and 'Different Odd & Even Pages' options using 'Section.PageSetup.DifferentFirstPageHeaderFooter' and 'Section.PageSetup.OddAndEvenPagesHeaderFooter' properties.
Also, using Aspose.Words, you can programmatically control How Headers and Footers should appear during Joining and Appending Documents.
I work with Aspose as Developer Evangelist.

Converting HTML to PDF exact replica not shown

I am doing an R&D for converting HTML to PDF.
We have created a page in Asp.net and has placed a CKEditor on it with simple options of selecting Fonts, Font Size, Bond, Italic etc. There is two more text boxes from where user can enter height and width of PDF to be generated. In addition to this we have div which shows preview of the content on the basis of text inserted in Editor. The Div height and width are set at run time, basically with this div we want to show how pdf is going to look like.
We are using wkhtmltopdf exe for generating PDF.
Now my problem is that the PDF being created is not exact replica of content shown in Div, sometimes it show exact content line by line but some times some words move out to next line in PDF
We have tried lot and lot of things to achieve exact result but could not successes any help is appriciable.
One thing you might try is using DOMPDF:
https://code.google.com/p/dompdf/
It's updated pretty frequently and has always given me great results.
Another suggestion would be to create styles specific to the PDF to get the desired result and then have those applied when the PDF is created.
In otherwords if you have a button "generate pdf" have it take all HTML content and then insert PDF specific style tags that you've tested in your PDF generator that makes the rendering the same as the browser. You could do this by just replacing:
</head>
With
<style>STYLES SPECIFIC TO PDF</style>
</head>
Upon creation of the PDF.
PDF rendering in my experience usually requires a few extra rules to make it look like web rending engines.
Hope that helps.

Categories