create table of contents with Pechkin html to pdf - c#

I'm currently working with the Pechkin library for creating pdf-files based on html.
It all works great.
But I want to add one thing, a table of contents (TOC). But I can't get this working.
With only wkhtmltopdf it's easy to do:
wkhtmltopdf toc --xsl-style-sheet toc.xsl index.html index.pdf
But with Pechkin it won't work. I have already a bookmark (which works in Adobe Reader), but it's not a real TOC what I want.
I've tried to add
ObjectConfig().SetTocXsl("tocXslStyleSheetUri")
But it seems to have no effect.
I also tried to work with:
ObjectConfig().SetCreateToc(true);
This will create an empty pdf because this function is obsolete.
So I get a nice pdf-result, but only without a Table of Content. Does anyone of you know how I get the TOC appear in my pdf-file?
I also asked this question as an issue on github, but because they're not always that quick with reacting, or doesn't react at all, I also asked the question here.

What if you create the table of contents on a seperate page and give its url to the
ObjectConfig object . I can show you the code if required.

Related

iTextSharp create PDF from another and add form data

I use ITextSharp to create a PDF with form data based on another PDF.
The problem is the file generated is not editable (the form on it).
If I use ITextSharp in append mode, I get the form editable but most of the form data is not preserved. I want the user to see the resulted PDF with the PDF Form data preserved.
I understand there is NOTHING I can do. The only way for the user to edit the resulted PDF is to use a paid Acrobat version on it. This is because I CHANGE the PDF file by entering form data and setting fonts on it.
Is there something I can do?
Paul
Your question isn't very clear, but here are some answers to similar questions that have been asked before:
End users can't edit a form locally unless the form is "reader-enabled". Making a form reader-enabled is only possible when you use Adobe software: "Adding Enable for commenting Adobe Reader" using Acrobat
You need to fill out reader-enabled forms in append mode if you don't want to break the reader-enabling: Pdf with Acroform editing using iText
This doesn't mean you can't ask people to fill out a PDF form to gather data. See
Edit pdf embedded in the browser and save the pdf directly to server
You can capture that data, and fill out the form without flattening if you want to serve this form (including the data) to the end user: How to fill out a pdf file programmatically?
I'm pretty sure one of these question is a duplicate of what you're asking, but since your question isn't clear, it's hard to mark your question as an exact duplicate of one of them.
Short answer: No
Pdf file are likely to be secure (read only) and this is why everyone is using it. Most of the time, we convert a file into a pdf so maybe if you can get the 'file' and not the pdf will be a good move there.
From my experience in the past, I can confirm with you that iTextSharp may not convert all your data properly and this can make your generated file unusable. If not, you might have some weird line or some changes in the document behavior (ex. fields are not editable anymore).
If you really want to work with pdf file as input and do your stuff with it, you will need to understand the inner structure of it:
[PDF file format]
http://resources.infosecinstitute.com/pdf-file-format-basic-structure/
This can be a hell of a ride. You might need to re-consider the use of a pdf as input. If you can't change that, you might need to use some sort of adobe pluging to do so. Alot of third party pdf library is doing that.
Good luck

Search text in pdf

can someone tell me if AcrobatAccessLib (Acrobat Access 3.0 Type Library) in com references can be used for text searching in pdf document?
It contains class PDDom, but I dont know if I can load document into it or, how to work with it.
(I dont wanna use iTextSharp, and others, I tryied it but not works as I wanted - pdf has corrupted number paging + contains tables, that are across 2 pages - iTextSharp finds me searching text on both pages - instead of 1, but if I use Acrobat Reader - it works well)
EDIT: Or another question, Can I use acrobat reader and its searching module in my application?
I am working in c#
Thanks a lot!
Try to use PDFLIBNET.DLL
in that dll have pdfwrapper class, this class provides lots methods to get text from pdf. The FindText method used to get a text from a particular position, and exportToText method gives content of pdf page
from that content u will search the pdf content..
am using tat DLL and searching the pdf content with out any issue..
try it and let me know..
If money is not an issue, I would by the Aspose PDF components. They work pretty well and are built for server usage.

How to create a word document using html written in C#

I creating a C# application that has to create a word document.
I'm using the Microsoft.Office.Interop.Word to do this and I've successfully managed to output some word documents, but creating the content trough the code is a very time consuming work.
I noted that word is able to open html pages and show it as a normal content so I created a simple test table in html and inserted it into the word document. But when I outputted the document the obvious happened: The tags where still there! Word did not format the tags as html. It just outputted exactly what I put in there.
How can I tell word to reformat the text as html?
edit: (trough the C# code of course)
edit 2: Please note that I'm parsing trough some data to make this, so I will end up with about 4 pages of the same table/html, so I will need to be able to tell word to start at the next page each time I've finished a loop. So a html-only method will probably not work.
If you're only wanting to output simple HTML content as a Word document, you could always cheat and write out the HTML content with a .doc extension.
Word will open that just fine.
If you need to add a page break, you can use a CSS page-break-before, like so:
<br style="page-break-before: always;"/>
If you're set on using Interop, having read up a little bit, this post states that you need a converter to insert HTML, and the converters are only accessible when:
you paste HTML from the Clipboard
open/insert HTML from a file
So, this answer looks like it provides a clipboard-based solution : Adding html text to Word using Interop
However, if there's any money to spend on the project, I can heartily recommend Aspose.Words which will do all of this for you.
As requested by the OP, and to make easier for others to find this solution, here it goes the answer I posted as a comment (plus extra results from testing):
When opening an HTML file, MS Word honors the CSS properties page-break-before and page-break-after. There is a caveat, however:
On "Web design" view, page-breaks are never shown (this doesn't mean that they aren't there), just like browsers don't "show" them. And Word opens html files on Web design view by default (which quite makes sense). You need to print the document or switch to some other view (typicall "Print design") to see your breaks in all their glory.
So, saving an HTML file with a .doc extension is a viable solution (also tested: Word opens it properly despite of the extension).
Note: all the testing was done on MS Word 2003 using this snippet: <html>asdf<br style="page-break-before: always;">new page!</html>
Don't build the document in code, create it in Word as template or mail merge template and the use code to merge or replace the fields data.
See this answer here
MS Word Office Automation - Filling Text Form Fields And Check Box Form Fields And Mail Merge
And See this from the mothership:
http://msdn.microsoft.com/en-us/library/ff433638.aspx
If you don't want to use an external lib, Interop is too slow for you and neither pure HTML nor mail merge template are flexible enough, you could write your content as text or HTML into one or more files (using C#), create a VBA macro in a Word document which by itself creates a second Word document, reads the content files and does any formatting you want afterwards.
You can run this macro programmatically by starting Word using the command line switch /m.
Another possible approach, if your html is xhtml (i.e. XML compliant), you could use XSLT to convert it to a Word XML format. But this would take a LOOOOOOOOOOONG time to code.
If you don't have to use HTML as the starting point you could simply build the Word XML document yourself rather than using XSLT, which would be easier. Time consuming but possible - it's something I do quite a lot in my work.
If a third party component is an option I would recommend the stuff from Aspose.
I have been pretty happy with their tools so far. The API is a little messy but everything works as one would expect.

Quick & Easy PDF Viewer Ideas

This question has been asked several times but my situation is a little different.
I have a web application written in C# where I get a string value passed to this page I'm working on. This string value represents a filename of a PDF file I need to display on this page. I'm supposed to have a left panel where I have some information displayed, and a right panel showing the contents of a PDF file. I'm using a simple table here to separate the panels. All PCs should already have Acrobat Reader installed.
My question is simple. How can I display the contents of the PDF file within this table? I don't need anything fancy. It has to be free and simple enough for a newbie like me to set up. It could even be written in jQuery/AJAX, if there even is a way.
I've looked at a Webbrowser control within an ASP.NET page, but it looks way too complicated for a simple viewer. I looked at Webparts, but I'm not sure if that's doable in a non-Sharepoint environment. Suggestions?
Have you tried the object tag? More ideas are here Make PDF display inline instead of separate Acrobat Reader window

Adding a Table to a pre-existing PDF using iTextSharp/iText

I need to add a table using iTextSharp (or even PDFSharp if it can do it) into an existing PDF template at a particular location in the template. I can edit the existing template with Adobe Designer 7.0. How can I go about doing this? Is there a PlaceHolder analog from Asp.Net which can be used here?
Keith
After some some experimenting, yes, you can do so but with a caveat. First, put a text field down somewhere. In the code, grab the coordinates of where it is. Then build some sort of object to insert there, a table in my case. One caveat is there is no flow, the document will not adjust itself. If your object is too big, it will overwrite (overflow?) what was previously already there. I was hoping for a nice re-flow of what was there, but no dice. I finding that PDFs are a pretty static beast. But I think I'll get it to work for my needs.

Categories