I'm trying to create a web script that will allow me to alter PDF templates that I have uploaded and re-output them. I have tried Zend already which allows me to write to a PDF but that means leaving the PDF blank in certain space which is to primitive for what I need. PDFFlip was not any better.
We need to implement functionality so we can remove content from the PDF as well as remove and replace. I have looked at CAM::PDF and changepagestring.pl but I'm not sure it's up to the job. I was hard pressed to find any real usage examples and Perl is not a language I have used before.
This is for a web project but I am flexible with the language we use, ideally PHP or ASP.NET C# would be great. Preferably not Java unless there is no other way.
I should also point out that I looked through the FoxitReader SDK without any luck. I never tried to implement it but I found no mention of find and replace like functionality.
You can tinker with PDF text but it is not straight-forward just to search and replace. The text is designed as an end-file format not for easy editing. I wrote a blog post explaining some of the issues at http://pdf.jpedal.org/java-pdf-blog/bid/12670/PDF-text
May be as workaround it's better to hold and fill in templates in some format that is more convenient for editing? E.g., you can keep your templates as Microsoft Word templates and then export them to PDF after filling. This thread may be useful on this way.
PDF file format isn't quite appropriate for editing.
Alternatively, you may prepare your templates as PDFs containing form fields. In this case filling of form fields is common and well-known task and there a lot of pdf components for this.
Related
I am searching from last two days but did not find any thing.
My requirement is to create a document viewer in my web application (C#.Net) and I don't want to use any third party tool for this. Can I convert the files in image or PDF or in any common formate which can be easly render on web page. I also can not use Introp object.
Any help will be highly appreciated
You mention in one of your comments that you'd like to write all the code yourself but don't know where to start. Here's how I would go about it...
First, you'll need to familiarize yourself with the Microsoft Office Format specification. You can find that here (there's a link to the technical specification). Office documents are actually a .zip file with an XML file inside along with any binary data representing attachments. Just renamed a .docx file as .zip and you'll be able to open it up and see the XML and any other supporting documents inside (same is true for xlsx, etc...).
Then you'll need to become intimately familiar with either PDF or HTML, as your job now will be to convert the various Office document structure into PDF or HTML structure, being sure to respect page layout, margins, order, etc...
As others have said, this is a large task which is why third party tools exist today. Also, each third party toolset has it's limitation as this is really hard to "get right" in all situations and there will be edge cases that work for one document and not another (because maybe they didn't use Microsoft Word to save the .docx, maybe they used OpenOffice and OpenOffice interpreted the standard slightly differently...)
If you cannot use COM/Interop technologies in your solution, you can take a look at the specialized 3rd party options. I see that you prefer not to use them, however, there are no existing built-in solutions in the .NET Framework. Check out my answer in a similar thread that describes how to accomplish exactly the same task using 3rd party libraries (for example, DevExpress, since I have experience with it). In addition, take a look at the Documents demo, where you can see how to create images/thumbnails from different types of MS Office documents.
I believe what you need is an intermediate representation of the documents which can be converted into an image for the viewer to display.
Lets me try to explain with the below diagram:
You can use tools like smallpdf or OfficeToPDF to do that. Just integrate them into your application.
Small PDF(https://smallpdf.com/library-detail)
officetopdf (https://officetopdf.codeplex.com/)
I need to create a C# or C++ (MFC) application that converts pdf files to txt. I need not only to convert, but remove headers, footers, some garbage characters on the left margin etc. Thus the application shold allow the user to set page margins to cut off what is not needed. I actually have already created such an application using xpdf, but it gives me some problems when I am trying to insert custom tags into the extracted text to preserve italics and bold. Maybe somebody could suggest something useful?
Thanks.
There are shareware and freeware utilities out there. Try fetching their source code, or perhaps use them the way they are.
A public version of the PDF specification can be found here: Adobe PDF Specification
PDF Shareware readers can be found: PDF Reader source code # SourceForge
Please look at Podofo. It's a LGPL-licensed library that has many powerful editing features. One of it's examples, txt2pdf IIRC, is a good start: it shows basic text-extraction; From there you can check if pre (in pdf engine) or post (in text) filtering suffices to your goals. I didn't get to use Pdf Hummus, but it's supposed to have these capabilities too, although it's less straightforward.
I'm using itextsharp to generate the PDFs, but I need to change some text dynamically.
I know that it's possible to change if there's any AcroField, but my PDF doen's have any of it. It just has some pure texts and I need to change some of them.
Does anyone know how to do it?
Actually, I have a blog post on how to do it! But like IanGilham said, it depends on whether you have control over the original PDF. The basic idea is you setup a form on the page and replace the form fields with the text you want. (You can style the form so it doesn't look like a form)
If you don't have control over the PDF, let me know how to do it!
Here is a link to the full post:
Using a template to programmatically create PDFs with C# and iTextSharp
I haven't used itextsharp, but I have been using PDFNet SDK to explore the content of a large pile of PDFs for localisation over the last few weeks.
I would say that what you require is absolutely achievable, but how difficult it is will depend entirely on how much control you have over the quality of the files. In my case, the files can be constructed from any combination of images, text in any random order, tables, forms, paths, single pixel graphics and scanned pages, some of which are composed from hundreds of smaller images. Let's just say we're having fun with it.
In the PDFTron way of doing things, you would have to implement a viewer (sample available), and add some code over a text selection. Given the complexities of the format, it may be necessary to implement a simple editor in a secondary dialog with the ability to expand the selection to the next line (or whatever other fundamental object is used to make up text). The string could then be edited and applied by copying the entire page of the document into a new page, replacing the selected elements with your new string. You would probably have to do some mathematics to get this to work well though, as just about everything in PDF is located on the page by means of an affine transform.
Good luck. I'm sure there are people on here with some experience of itextsharp and PDF in general.
This question comes up from time to time on the mailing list. The same answer is given time and time again - NO. See this thread for the official answer from the person who created iText.
This question should be a FAQ on the itextsharp tag wiki.
I'm in need of a solution to print or export (pdf/doc) from C#. I want to be able to design a template with place holders, bind an object (or xml) to this template, and get out a finished document.
I'm not really sure if this is a reporting solution or not.
I also don't want to have to roll my own printing / graphics code -- I'd like all display concerns handled in a template.
I initially think of this as something Crystal Reports can do (although I've never used CR), but I'm not sure if I'm abusing the system here -- I'm not really interested in binding ADO.NET datasets at the moment (screw datasets). Can Crystal deal with binding to objects?
Does SSRS or WPF play in this field too?
A subset of WPF-P is XPS which can be used to present your objects via databinding.
One of the best choices if you are already using WPF.
Google Keywords: XPS, FixedDocument, FlowDocument, WPF Printing
Might read through this thread:
http://groups.google.com/group/nhusers/browse_thread/thread/e2c2b8f834ae7ea8
Seems a lot of people like iTextSharp
http://itextsharp.sourceforge.net/
For Word docs, look into Word's Mail Merge feature and Word automation. I did this recently in a form letter printing project. Basically what I did was create a Word template file (file extension .dot) and in this template file I defined MergeFields in a standard form letter. My application queries a database for the records it needs to print and then for each record it returns it matches fields in the database with these merge fields and sends the result (the merged doc) to the printer.
It's working really well and if I had a link that gave a definitive explanation, I'd provide it (check back here, I'll see if I can't find the most useful ones). Hopefully I've provided enough keywords to let you find your own resources. I can go into more detail if you need.
I've never had to export PDF files but for a project I'm working on now I'll have to. For a free solution my research has lead to iTextSharp (like Will Shaver points out) but I've only done the initial investigations and I have found a few pay solutions I might end up resorting to.
Well, heres my scenario.
Client/Server winforms application with SQL Express as the DB. I need to be able to print invoice, packing slips etc..
i would like the customer to be able to modify the invoices. ie. be able to put their logo or change font sizes etc...basically format the display.
Things i have considered so far are.
1) Use a template engine (similar to codesmith or mygeneration) and use templates that output HTML. Then print the html page.
2) Use ReportViewer in local mode. I've heard that users can download a plugin for web dev express and edit the local report files. can anyone confirm this?
3) Use Reportviewer in remote mode.
I don't have much experience with ReportViewer so I'm not sure if i should use local or remote mode as well.
Those of you that have done this kind of thing before whats your recommendation?
After just completing a project with it, I would heartily recommend iTextSharp to create your invoices and other forms as PDFs. In addition to creating PDFs from scratch, you can also use it to fill in PDF forms and/or templates created with Acrobat (or even MS Office/OpenOffice). And it's free.
It's pretty easy to use in Windows apps or in ASP.Net applications. Most of the documentation and the books on it (iText in Action, for example) are about the original Java version, iText. However, there are tutorials and example code on the conversion process and, for the most part, all of the functions and libraries work the same in the .Net version, so adapting the book and reference code has been no problem.
I definitely learned the hard way that HTML and CSS are great for browsers (well, great except for the "every browser interprets it different" problem), but horrible for trying to generate consistent, attractive, and precise printed output and forms.
I'm personally using Aspose Words: they use word documents as templates, and I'm using Words bookmarks function to mark and retrieve the fields I need to fill.
Aspose works nicely with Tables (ie: you can add lines to a table, etc...) and sees Word documents as XML documents. You can then save the document as MSWord or PDF.
I wouldn't say it's the greatest library in the world, but it's definitely worth having a look :)
you can use Crystal Report for this. But first you need to scan the INVOICE and save it as an image,
Next is, on your crystal report, export the image on to it, and DRAG the fields to where they must print on the invoice (IMAGE SERVES AS YOUR GUIDE). Then after everything has been set-up, DELETE THE IMAGE and try it.
hope this helps.