I have a pdf template without AcroFields and i need to replace text in it. The text is formated like this ((aFieldToReplace)), but there are also tables that need filled up with a n-numbered rows.
Is there any good tutorial, resource or sample to find?
Is there a way to replace a text in a PDF file with itextsharp? has more or less the same question but the answer ignores the "no Acrofield" part of the question.
EDIT:
To make it even harder, i have multiple templates that i can use. The templates have all there own formatting-style (font, color,...)
EDIT 2:
The purpose is to create a report with some data in a database. The data in a database is coming from several forms in a ASP.NET MVC application.
The report could have several layouts depending on the chosen template.
Templates should be addable dynamically, so i can't create the layout from scratch. I really need to get the layout from a template.
Quoting the excellent iText in Action:
In a PDF document, every character or glyph on a PDF page has its fixed position, regardless of the application that’s used to view the document.
[…]
Suppose you want to replace the word “edit” with the word “manipulate” in a sentence, you’d have to reflow the text. You’d have to reposition all the characters that follow that word. Maybe you’d even have to move a portion of the text to the next page. That’s not trivial, if not impossible.
[…]
Don’t expect any tool to be able to edit a PDF file the same way you’d edit a Word document.
PDF is a document display format. If you want templating you'll probably have to use something else.
#Frederiek:
If you can spend a bit of money, this will do exactly what you want. Check out the demo, it's quite cool. It can reflow the text, replace images, etc. Quite nice.
http://www.iceni.com/infixServer.htm
Let me know if that works for you.
Related
I was wondering whether it would at all be possible to have our creative department design a nice-looking PDF template for our client, e.g. a fancy letterhead, then supply it to me so I could inject various types of content into the body using PDFSharp or MigraDoc.
Currently we generate the header and footer content as part of the rendering process, and it works very well, but as you can imagine, any non-trivial layout and styling is pretty complicated to pull off in what is essentially a 2D graphics environment.
So the thought arose as to whether one of these tools would be able to take a pre-existing PDF, give me access to various objects, and allow me to e.g. replace certain text placeholders or manipulate the PDF "DOM" in a more intelligent fashion.
Something similar to working with Spreadsheets (binary and XML versions) or OpenXML, etc.
What we do: take an existing PDF page, draw it at the bottom (Z axis) of a new PDF page, and then use MigraDoc to add other contents to the page.
PDFsharp can also be used to draw on top.
The template PDF pages are used like letter heads with the corporate design of a customer and the final document will have as many pages as needed.
I'm creating docx file by using DocumentFormat OpenXML API, appending texts, tables, images, etc... at some point I'd like to know how much "space" (rows, pixels, whatever) I have before I reach the end of the page.
Is it possible to get this information? The page size, formatting, margins, etc, are all there - everything that's needed to calculate this is in the document.
I do understand that the document itself doesn't deal with formatting, how many pages are there etc, but if not from the OpenXML API, is there some other way to find this out? Maybe some dummy formatter that reads this docx and can be used to find out the 'formatting' data - where is currently the last character positioned on a page, etc... is there such a thing?
I'm not aware of such a "dummy formatter". Such a thing would need to replicate Word's page layout model (including line spacing, how it hyphenates words, calculates space for the header/footer etc). That might be relatively straightforward for simple text only documents, but for a full solution, you have tables, columns, images, text boxes etc etc to worry about.
Can you automate Word and ask it?
I have a few pdf files that were created from word or excel files.
I need to get the information thats in the tables.
The text in the document is not an image so I'm able to extract the text using tools such as pdfbox.
When I have the text I have no way of knowing what cells in the table it belongs to because I don't know where the table borders are.
Iv'e tried a few desktop tools such as abby or solid pdf converter and they are able to convert the files into nice word documents but this doesn't suit my needs as I want to be able to do this programatticly in C#.
Some of the tables have nested tables wich I think makes this a little bit more diffucult.
I appreciate your help
The difficulty here is caused by the fact that the text in the PDF is not contained within any table. It might look like it is, but underneath the surface, it is not.
So there are a couple of options that I can think of. But none of them are going to be quite as satisfying as you'd probably like.
There are some companies that offer SDKs for PDF to Excel/Word conversion. Investintech and Iceni are a couple of examples. But these solutions are not free.
If you know the exact layout of the PDF files that you need to extract the table data from, then you can use any SDK that lets you extract text from a PDF and also tells you the exact co-ordinates of the extracted text. Using this method you need to know in advance where the text is going to be, so that you can extract text from a specific area on the page. It obviously won't work if you need to process any random document.
It's a difficult task, but hopefully this will give you a starting point.
I'm trying to create a web script that will allow me to alter PDF templates that I have uploaded and re-output them. I have tried Zend already which allows me to write to a PDF but that means leaving the PDF blank in certain space which is to primitive for what I need. PDFFlip was not any better.
We need to implement functionality so we can remove content from the PDF as well as remove and replace. I have looked at CAM::PDF and changepagestring.pl but I'm not sure it's up to the job. I was hard pressed to find any real usage examples and Perl is not a language I have used before.
This is for a web project but I am flexible with the language we use, ideally PHP or ASP.NET C# would be great. Preferably not Java unless there is no other way.
I should also point out that I looked through the FoxitReader SDK without any luck. I never tried to implement it but I found no mention of find and replace like functionality.
You can tinker with PDF text but it is not straight-forward just to search and replace. The text is designed as an end-file format not for easy editing. I wrote a blog post explaining some of the issues at http://pdf.jpedal.org/java-pdf-blog/bid/12670/PDF-text
May be as workaround it's better to hold and fill in templates in some format that is more convenient for editing? E.g., you can keep your templates as Microsoft Word templates and then export them to PDF after filling. This thread may be useful on this way.
PDF file format isn't quite appropriate for editing.
Alternatively, you may prepare your templates as PDFs containing form fields. In this case filling of form fields is common and well-known task and there a lot of pdf components for this.
I'm using itextsharp to generate the PDFs, but I need to change some text dynamically.
I know that it's possible to change if there's any AcroField, but my PDF doen's have any of it. It just has some pure texts and I need to change some of them.
Does anyone know how to do it?
Actually, I have a blog post on how to do it! But like IanGilham said, it depends on whether you have control over the original PDF. The basic idea is you setup a form on the page and replace the form fields with the text you want. (You can style the form so it doesn't look like a form)
If you don't have control over the PDF, let me know how to do it!
Here is a link to the full post:
Using a template to programmatically create PDFs with C# and iTextSharp
I haven't used itextsharp, but I have been using PDFNet SDK to explore the content of a large pile of PDFs for localisation over the last few weeks.
I would say that what you require is absolutely achievable, but how difficult it is will depend entirely on how much control you have over the quality of the files. In my case, the files can be constructed from any combination of images, text in any random order, tables, forms, paths, single pixel graphics and scanned pages, some of which are composed from hundreds of smaller images. Let's just say we're having fun with it.
In the PDFTron way of doing things, you would have to implement a viewer (sample available), and add some code over a text selection. Given the complexities of the format, it may be necessary to implement a simple editor in a secondary dialog with the ability to expand the selection to the next line (or whatever other fundamental object is used to make up text). The string could then be edited and applied by copying the entire page of the document into a new page, replacing the selected elements with your new string. You would probably have to do some mathematics to get this to work well though, as just about everything in PDF is located on the page by means of an affine transform.
Good luck. I'm sure there are people on here with some experience of itextsharp and PDF in general.
This question comes up from time to time on the mailing list. The same answer is given time and time again - NO. See this thread for the official answer from the person who created iText.
This question should be a FAQ on the itextsharp tag wiki.