Interactive API with PDF document - c#

I am working on a feasibility for having an application that can capture text from PDF. The simple use-case can be briefed as:
User selects text on PDF document (using Acrobat reader / or other PDF reader)
The selection completed event should be available to the .NET application that is observing.
Upon selection, the user can select, state some further properties (like category/level) and the same information is tagged along with the selected text inside the PDF file itself.
The selected text should be retained with highlighted color. The color will be different depending upon other parameters (like category/level) selected in the .NET application.
A separate application should be able to parse and gather these data from the PDF file.
Similar application is already working with MS Word files.
Edit:
The basic requirement is that there should be some way to notify the .NET application when user selects some text in the PDF document. The other requirement is that there should be a way to add a tag to the selected document.
Can somebody suggest some API/resource for such implementations?

Take a look on Amyuni PDF Creator .Net:
The SelectedObjectChange event should be enough for detecting new text being selected.
The DoCommandTool method combined with acCommandToolHighlight command tool can be used to activate the text highlight tool.
On your resulting file, you can enumerate all object in a page, and identify the highlighted text by using the object type acObjectTypeHighlight
You can get free support during your evaluation time.
Usual disclaimer applies

Related

iTextSharp create PDF from another and add form data

I use ITextSharp to create a PDF with form data based on another PDF.
The problem is the file generated is not editable (the form on it).
If I use ITextSharp in append mode, I get the form editable but most of the form data is not preserved. I want the user to see the resulted PDF with the PDF Form data preserved.
I understand there is NOTHING I can do. The only way for the user to edit the resulted PDF is to use a paid Acrobat version on it. This is because I CHANGE the PDF file by entering form data and setting fonts on it.
Is there something I can do?
Paul
Your question isn't very clear, but here are some answers to similar questions that have been asked before:
End users can't edit a form locally unless the form is "reader-enabled". Making a form reader-enabled is only possible when you use Adobe software: "Adding Enable for commenting Adobe Reader" using Acrobat
You need to fill out reader-enabled forms in append mode if you don't want to break the reader-enabling: Pdf with Acroform editing using iText
This doesn't mean you can't ask people to fill out a PDF form to gather data. See
Edit pdf embedded in the browser and save the pdf directly to server
You can capture that data, and fill out the form without flattening if you want to serve this form (including the data) to the end user: How to fill out a pdf file programmatically?
I'm pretty sure one of these question is a duplicate of what you're asking, but since your question isn't clear, it's hard to mark your question as an exact duplicate of one of them.
Short answer: No
Pdf file are likely to be secure (read only) and this is why everyone is using it. Most of the time, we convert a file into a pdf so maybe if you can get the 'file' and not the pdf will be a good move there.
From my experience in the past, I can confirm with you that iTextSharp may not convert all your data properly and this can make your generated file unusable. If not, you might have some weird line or some changes in the document behavior (ex. fields are not editable anymore).
If you really want to work with pdf file as input and do your stuff with it, you will need to understand the inner structure of it:
[PDF file format]
http://resources.infosecinstitute.com/pdf-file-format-basic-structure/
This can be a hell of a ride. You might need to re-consider the use of a pdf as input. If you can't change that, you might need to use some sort of adobe pluging to do so. Alot of third party pdf library is doing that.
Good luck

Fill up form in Word/Pdf

What I want to accomplish is a fill up form or a registration form for a hotel in word or pdf file. It's like after filling up name text box and clicking next another text box will appear.
And also it needs to have a preview before printing. If possible, showing the preview simultaneously.
I really want to know where to start or if there is any, sample would be great.
The way I would approach this problem is using Web form data and after confirmation from the user i would generate the report straight away. I dont think manipulating Word document or PDF document is the best approach :
Check User Input using your web form
Generate the Word or PDF document on your reporting server[Connect to it using WCF for instance]
Give the link of the generated document to the user. It would be up to the user to print it or only view it in their browser.

Automate editable PDF

I am working on an application that has a requirement for me to create a document, populated with data captured in my WPF application, for attaching to an email and sending to insurance company.
My client has been performing this by sending an editable PDF document to clients then proof reading upon receipt of completed form and manually attaching to email.
I initially thought of creating a word document laid out same format as existing and automating word using find/replace on placeholders such as etc...
However it would be great if I could populate the existing PDF docs wouldn't it.... Anyone know if possible?
Thanks.
You can use PDFsharp/MigraDoc to edit pdf-files. Parsing the existing document and inserting text/images is simple enough, but I don't know if there is any support for actual placeholders.
EDIT: Found this approach, using iTextSharp and form fields: Using itextsharp (or any c# pdf library), how to open a PDF, replace some text, and save it again?

Acrobat SDK: How to get events and write event handlers in c#

I am trying out to get events from a pdf doc and handle it in my c# code with the samples that come with Acrobat-SDK.
I am yet to understand how I can do it. I am yet to discover the class that provides me the events. All the classes currently expose methods only. It might be that I am missing something for sure.
Can somebody help?
My use case is:
the user will open a pdf doc and my application (or my app can trigger opening the pdf doc)
when the user selects some text from the pdf doc, my app should get the event
My event can handle the selection and get the selected text.
Put a bookmark on the selection in the pdf doc (with additional attributes)
the pdf doc retains such bookmarks when the pdf is saved.
bookmarks in the pdf should be available for edit.
A different app/code should be able to parse and retrieve these bookmarks along with the additional attributes of the bookmark.
I hope, I have not asked too much.

Is there a way to replace a text in a PDF file with itextsharp?

I'm using itextsharp to generate the PDFs, but I need to change some text dynamically.
I know that it's possible to change if there's any AcroField, but my PDF doen's have any of it. It just has some pure texts and I need to change some of them.
Does anyone know how to do it?
Actually, I have a blog post on how to do it! But like IanGilham said, it depends on whether you have control over the original PDF. The basic idea is you setup a form on the page and replace the form fields with the text you want. (You can style the form so it doesn't look like a form)
If you don't have control over the PDF, let me know how to do it!
Here is a link to the full post:
Using a template to programmatically create PDFs with C# and iTextSharp
I haven't used itextsharp, but I have been using PDFNet SDK to explore the content of a large pile of PDFs for localisation over the last few weeks.
I would say that what you require is absolutely achievable, but how difficult it is will depend entirely on how much control you have over the quality of the files. In my case, the files can be constructed from any combination of images, text in any random order, tables, forms, paths, single pixel graphics and scanned pages, some of which are composed from hundreds of smaller images. Let's just say we're having fun with it.
In the PDFTron way of doing things, you would have to implement a viewer (sample available), and add some code over a text selection. Given the complexities of the format, it may be necessary to implement a simple editor in a secondary dialog with the ability to expand the selection to the next line (or whatever other fundamental object is used to make up text). The string could then be edited and applied by copying the entire page of the document into a new page, replacing the selected elements with your new string. You would probably have to do some mathematics to get this to work well though, as just about everything in PDF is located on the page by means of an affine transform.
Good luck. I'm sure there are people on here with some experience of itextsharp and PDF in general.
This question comes up from time to time on the mailing list. The same answer is given time and time again - NO. See this thread for the official answer from the person who created iText.
This question should be a FAQ on the itextsharp tag wiki.

Categories