Is this TOC scenario possible using c# and iTextSharp? - c#

I've just downloaded iTextSharp and before I put a lot of effort into this I'd like to know if this scenario is possible with it. We have a client that is insisting that their SSRS report PDFs contain a table of contents, preferably with page numbers. The various components of these reports have highly variable lengths so we can't hard code actual page numbers. As you all probably know, there is no direct way to create a Table of Contents in SSRS. (We've even had a special session with the Microsoft rep about this.)
What I would like to do is as follows:
Mark the target locations in the SSRS report by setting their
DocumentMapLabel property.
Generate the pdf in the usual fashion, either from the report server
or a ReportViewer control. (This will be in c#.)
Open the pdf in my hypothetical code.
Insert a blank page at or near the front.
Scan the pdf for DocumentMapLabels (and, ideally, detect which page
they're on.)
Populate the blank page with links to the various sections.
Is this possible?

I wouldn't use your design. As soon as the TOC needs more than one page, you're in trouble. Maybe you're confident that this won't happen today, but what if that's needed tomorrow?
You have different options:
Create your document in one go. Add the TOC at the end. Reorder the pages before closing the document.
Create a document (e.g. in memory) using named destinations for the targets. Create a document (e.g. in memory) with the TOC referring to the named destinations. Merge the two documents into one document, consolidating the named destinations.
Create a document with bookmarks (this will result in a bookmarks panel to the left in Adobe Reader). Then read the bookmarks to create a TOC in PDF and merge the PDF with the TOC with the document that has the bookmarks.
All of this is documented.
In The Best iText Questions on StackOverflow, you'll find the answers to these (and many other) questions:
How can I add titles of chapters in ColumnText? (this sounds exactly like what you need)
Create Index File(TOC) for merged pdf using itext library in java
PDF Page re-ordering using itext
How to reorder the pages of a PDF file?
What you want to do is possible, but not the way you describe it. Read the book, pick an option and post a new question if you have a problem with the option you picked. Just download that book; it's free of charge.
Note: iText(Sharp) is free software, NOT freeware. This means that it is only free of charge if you agree with the open source license (the AGPL). It is not free of charge in all situations as explained in this video. That's also important to know before you start an iText(Sharp) project.

Related

Best way to generate PDF in c# using Word or InDesign?

I'm comfortable generating Word documents using Aspose.Word (which can also save as a PDF) but I've recently been asked to do the same thing using a PDF as the starter template. We recently bought Aspose.Total and whilst Aspose.Pdf looks like it can do some manipulations it doesn't look to be all that flexible/easy (like adding a big line of text and getting it to wrap, and shifting other content down the page if it takes up more space).
What would be the best way of using a PDF as a template for what is basically a bit like a mail merge from a database? Should I turn it into a PDF form and merge it from an XML data source? Is this even viable or would such a form still have a limitation on spacing (so that longer lines/paragraphs of text won't reflow the document where necessary)?
From what I can tell it doesn't look like InDesign can be manipulated in c# even via a COM object (which would be nasty on a web server anyway).
If I recreated the InDesign/PDF as a Word document I'm sure I could work wonders, but you know what these publishing types are like, who think Word documents are the tool of the devil. These PDFs are never going to a professional printer anyway; they're just brochures for a client to download from a web page (based on information in a database) for printing/use at home.
You have indeed many solutions for such a web to print project. Choosing one is a matter of budget, requirements and users count. Placing dynamic contents can be done at the simpliest with PDF forms fillable with xml data.
On the other hand you can work with InDesign Server and output PDF based on InDesign templates. That's generally a good choice when a large amount of users needs to get rich pdf files in parallel. But the costs are heavy.
You can also envision A pitstop server or Callas PDFToolBox Server to place dynamic texts based on variables as supplied by you. The good point here is that you don't need much coding here. Those apps are ready to use.
You can at last consider command line tools. A few of them may have some useful commands such as pdfTk or cPdf to merge texts.

concatenating word documents and converting them to pdf

what is the best possible way to merge multiple documents and convert them to pdf. also we need to insert blank pages for every odd pages.
A fully supported, server side automated version of this (mostly baked into the the MS camp though) involves using the OpenXMLSDK to do any field inserts, then using Sharepoint's Word Automation Services (SP 2010) to convert the documents to PDF, and then pick your favorite PDF toolkit (iTextSharp for me) for any post processing (merging documents, inserting blank pages, or images that must be positioned relative to specific pages).
The reason for doing the document merge in PDF rather than OpenXML is simplicity - you don't have to deal with merging styles, headers etc.
The reason for doing the blank pages and image insertion is that OpenXML has no idea how to render the content, and so it has no idea where page breaks would occur naturally (you can still insert breaks like you would in Word though).
If you are using C# and you are OK with a server based solution then have a look at this post. It uses a .net friendly web services interface.
There is an optional SharePoint version available as well, but as you did not include a SharePoint tag I assume that won't be of interest to you.
Full disclosure, I wrote that post.

Using C# to create a Word document from an existing template

I have several Word templates and I wish to use these to dynamically create Word documents in my app. I wish to avoid using automation at all costs as this is no good. I know that I can use both HTML and XML to create word documents but I just don't know where to start with regards to using a template that may well have images in the footer or the header of a document.
I use the OpenXML SDK with Word 2007. After you get the hang of it, it's not so bad. I have several template docx files that I scan through to search and replace for placeholder strings with what I want, and then can stitch together multiple templates into one document if I want to. It's nice because I can start with docx files as the template and modify them while the whole time staying within the realm of the docx format. If an image is in the docx when you start modifying it, it'll be there after you re-save it after modification (provided you didn't programmatically remove it of course).
If you have more details with what you'll be doing, let us know.
You could use DocX. It's free, very easy to use, with nice tutorials and is feature reach. It works with only DOCX documents thou. Also development is currently on hold until the author will finish his semester. Here's detailed blog about it.
It has good example of using template in his Invoice Example.
MigraDoc http://www.pdfsharp.net/MigraDocOverview.ashx is a free utility for exporting PDF/Word/HTML files. I've not worked with it using templates as yet however, you could use the DDL files to persists a layout for your files to be re-used.

How can I programmatically create PDF bookmarks from PDF file?

So, I have used Pdf995's PDF print driver from a web browser to print web pages and eventually use PdfEdit995 to join these various PDF files into one large PDF.
Now I have a lot of large PDF documents that I wish to add bookmarks to, but am hoping there is a relatively easy way of doing this programmatically (using C#, preferably) - basically, I want to find, within each PDF, text that is large enough to qualify as a header, and use that text as the bookmark.
Any tips/advice/direction? Thanks!
It's definitely possible to do this, but I would recommend finding a PDF library that does most of the leg work. Technically you could do it all yourself with the aid of the PDF specification, but that'd probably take more time than it's worth.
The library will need to be able to let you find text in a document and then return the page and size, font, etc, of the text and create bookmarks (also known as outlines) based on that information programmatically.
My companies product, Quick PDF Library, can help you do this and so can PDFKit.NET. I'm sure there are other libraries out there that support this functionality too. As far as free libraries go, from what I've seen I don't believe that PDFSharp or iText will meet all of your requirements in this case, but I'm sure someone will correct me if I'm wrong.
If you'd prefer to develop a solution for this entirely yourself, then the PDF reference is available online for free.

Is there a way to replace a text in a PDF file with itextsharp?

I'm using itextsharp to generate the PDFs, but I need to change some text dynamically.
I know that it's possible to change if there's any AcroField, but my PDF doen's have any of it. It just has some pure texts and I need to change some of them.
Does anyone know how to do it?
Actually, I have a blog post on how to do it! But like IanGilham said, it depends on whether you have control over the original PDF. The basic idea is you setup a form on the page and replace the form fields with the text you want. (You can style the form so it doesn't look like a form)
If you don't have control over the PDF, let me know how to do it!
Here is a link to the full post:
Using a template to programmatically create PDFs with C# and iTextSharp
I haven't used itextsharp, but I have been using PDFNet SDK to explore the content of a large pile of PDFs for localisation over the last few weeks.
I would say that what you require is absolutely achievable, but how difficult it is will depend entirely on how much control you have over the quality of the files. In my case, the files can be constructed from any combination of images, text in any random order, tables, forms, paths, single pixel graphics and scanned pages, some of which are composed from hundreds of smaller images. Let's just say we're having fun with it.
In the PDFTron way of doing things, you would have to implement a viewer (sample available), and add some code over a text selection. Given the complexities of the format, it may be necessary to implement a simple editor in a secondary dialog with the ability to expand the selection to the next line (or whatever other fundamental object is used to make up text). The string could then be edited and applied by copying the entire page of the document into a new page, replacing the selected elements with your new string. You would probably have to do some mathematics to get this to work well though, as just about everything in PDF is located on the page by means of an affine transform.
Good luck. I'm sure there are people on here with some experience of itextsharp and PDF in general.
This question comes up from time to time on the mailing list. The same answer is given time and time again - NO. See this thread for the official answer from the person who created iText.
This question should be a FAQ on the itextsharp tag wiki.

Categories