I am trying out to get events from a pdf doc and handle it in my c# code with the samples that come with Acrobat-SDK.
I am yet to understand how I can do it. I am yet to discover the class that provides me the events. All the classes currently expose methods only. It might be that I am missing something for sure.
Can somebody help?
My use case is:
the user will open a pdf doc and my application (or my app can trigger opening the pdf doc)
when the user selects some text from the pdf doc, my app should get the event
My event can handle the selection and get the selected text.
Put a bookmark on the selection in the pdf doc (with additional attributes)
the pdf doc retains such bookmarks when the pdf is saved.
bookmarks in the pdf should be available for edit.
A different app/code should be able to parse and retrieve these bookmarks along with the additional attributes of the bookmark.
I hope, I have not asked too much.
Related
I use ITextSharp to create a PDF with form data based on another PDF.
The problem is the file generated is not editable (the form on it).
If I use ITextSharp in append mode, I get the form editable but most of the form data is not preserved. I want the user to see the resulted PDF with the PDF Form data preserved.
I understand there is NOTHING I can do. The only way for the user to edit the resulted PDF is to use a paid Acrobat version on it. This is because I CHANGE the PDF file by entering form data and setting fonts on it.
Is there something I can do?
Paul
Your question isn't very clear, but here are some answers to similar questions that have been asked before:
End users can't edit a form locally unless the form is "reader-enabled". Making a form reader-enabled is only possible when you use Adobe software: "Adding Enable for commenting Adobe Reader" using Acrobat
You need to fill out reader-enabled forms in append mode if you don't want to break the reader-enabling: Pdf with Acroform editing using iText
This doesn't mean you can't ask people to fill out a PDF form to gather data. See
Edit pdf embedded in the browser and save the pdf directly to server
You can capture that data, and fill out the form without flattening if you want to serve this form (including the data) to the end user: How to fill out a pdf file programmatically?
I'm pretty sure one of these question is a duplicate of what you're asking, but since your question isn't clear, it's hard to mark your question as an exact duplicate of one of them.
Short answer: No
Pdf file are likely to be secure (read only) and this is why everyone is using it. Most of the time, we convert a file into a pdf so maybe if you can get the 'file' and not the pdf will be a good move there.
From my experience in the past, I can confirm with you that iTextSharp may not convert all your data properly and this can make your generated file unusable. If not, you might have some weird line or some changes in the document behavior (ex. fields are not editable anymore).
If you really want to work with pdf file as input and do your stuff with it, you will need to understand the inner structure of it:
[PDF file format]
http://resources.infosecinstitute.com/pdf-file-format-basic-structure/
This can be a hell of a ride. You might need to re-consider the use of a pdf as input. If you can't change that, you might need to use some sort of adobe pluging to do so. Alot of third party pdf library is doing that.
Good luck
I am working on an application that has a requirement for me to create a document, populated with data captured in my WPF application, for attaching to an email and sending to insurance company.
My client has been performing this by sending an editable PDF document to clients then proof reading upon receipt of completed form and manually attaching to email.
I initially thought of creating a word document laid out same format as existing and automating word using find/replace on placeholders such as etc...
However it would be great if I could populate the existing PDF docs wouldn't it.... Anyone know if possible?
Thanks.
You can use PDFsharp/MigraDoc to edit pdf-files. Parsing the existing document and inserting text/images is simple enough, but I don't know if there is any support for actual placeholders.
EDIT: Found this approach, using iTextSharp and form fields: Using itextsharp (or any c# pdf library), how to open a PDF, replace some text, and save it again?
I am working on a feasibility for having an application that can capture text from PDF. The simple use-case can be briefed as:
User selects text on PDF document (using Acrobat reader / or other PDF reader)
The selection completed event should be available to the .NET application that is observing.
Upon selection, the user can select, state some further properties (like category/level) and the same information is tagged along with the selected text inside the PDF file itself.
The selected text should be retained with highlighted color. The color will be different depending upon other parameters (like category/level) selected in the .NET application.
A separate application should be able to parse and gather these data from the PDF file.
Similar application is already working with MS Word files.
Edit:
The basic requirement is that there should be some way to notify the .NET application when user selects some text in the PDF document. The other requirement is that there should be a way to add a tag to the selected document.
Can somebody suggest some API/resource for such implementations?
Take a look on Amyuni PDF Creator .Net:
The SelectedObjectChange event should be enough for detecting new text being selected.
The DoCommandTool method combined with acCommandToolHighlight command tool can be used to activate the text highlight tool.
On your resulting file, you can enumerate all object in a page, and identify the highlighted text by using the object type acObjectTypeHighlight
You can get free support during your evaluation time.
Usual disclaimer applies
I am trying to obtain html from the WebBrowser control, but it must include the value attributes of input elements on the page as well.
If I use webBrowser.DocumentText, I get the full HTML of the page as it was initially loaded. The input field values are not included.
If I use webBrowser.Document.Body.OuterHtml, I get the values, but not the other contents of (), which I need so I can get the stylesheet links, etc.
Is there a clean dependable way to obtain the full HTML of the DOM in its current state from the WebBrowser? I am passing the HTML to a library for it to be rendered to PDF, so suggestions for programmatically saving from the WebBrowser control to PDF will also be appreciated.
Thanks
There are some undocumented ways (changing registry, undocumented dll export) to print the document to XPS or PDF printers without parsing the page, that is, if your can afford to roll out required printer drivers to your customer's network.
If you want to parse the web page, documentElement.outerHTML should give you the full canonicalized document, but not the linked image, script or stylesheet files. You need to parse the page, enumerate elements and check element types and get resource urls before digging the WinInet cache or downloading for additional resources. To get the documentElement property, you need to cast HtmlDocument.DomDocument to mshtml.IHTMLDocument2 if you use Windows Forms, or cast WebBrowser.Document to mshtml.IHTMLDocument2 if you use WPF. If you need to wait before the Ajax code finishes execution, starting a timer when the DocumentComplete event is raised.
At this stage, I would parse the HTML DOM and get the necessary data in order to generate a report via a template, so you always have the option to generate other formats supported by the report engine, such as Microsoft Word. Very rarely I need to render the HTML as parsed, for example, printing a long table without adding customized header and footer on each page. That said, you can check Convert HTML to PDF in .NET and test which one of the suggested software/components works best with your target web site, if you do not have long tables.
I'm currently using Aspose PDF Kit to split a 'master PDF' up into individual documents + thumbnails. This works well at the moment, but the device I'll be rendering the PDF on won't know about the annotations/links within the PDF.
I understand there is a way to parse the PDF document to detect the X/Y position of a hyperlink etc, is there an simple way to extract/iterate across the document data so I can write it to an external XML file?
You may want to try Docotic.Pdf library for this (disclaimer: I work for Bit Miracle).
The library can be used to retrieve all hyperlinks in a document. You may retrieve bounding box, text and other properties of a link, too.
Please take a look at "Extract text from link target" sample. It may help you to get started.