Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I've a requirement to be able to generate PDF's within our (ASP.net) application. We need to meet the following requirements:
The text will be largely dynamic and must be added programatically.
Ideally we'd like to base these generated documents off PDF templates provided by our designers.
Some of the sections/pages may or may not be visible depending on certain conditions - ideally we'd like the content to 'flow' upwards to fill the space when something is removed.
Some of the pages may need to repeat, depending on certain conditions.
Some of the PDF templates will be out of our control (we're populating application forms supplied by third parties), so being able to read in a blank PDF and populate it would be good.
I've looked at iTextSharp and it seems to do most of these things (i.e. I can take a PDF, edit it to include form fields where we need to fill data and then use iTextSharp to read that in as a template and populate the data), however, I'm not sure how then to go about potentially hiding whole sections and/or repeating pages.
What I'm looking for here is a little advice from anybody who's been in a similar situation.
Small update here, we went with iTextSharp - it's a powerful tool, but does take a bit of learning, however it's quick, light, and does precisely what we want it to do.
However...
I would point out that the latest version is no longer really free (as in beer) - the license does not permit commercial usage. As a result, we now have a licensed version, but it's not shockingly cheap (and they don't publish a price list).
I would use iTextSharp. I create all kinds of PDF files based on different templates and iText has worked the best for me. It is a very powerful control and can manipulate pdf files in just about any way.
I'm not sure that iText can handle your third requirement. I know that it can create a pdf from a html file. Maybe use what Tomas posted and create the pdf with iText.
There are various commercial components to do this. Look at: PDFLib (http://www.pdflib.com/), Tall Components (http://www.tallcomponents.com/) etc
I have use products from Tall components and can recommend them.
Some others are a lot more expensive.
ITextSharp as you've mentioned is quite good and can be used to add/remove external pages, in this can you could hide full pages by ommiting them, or replacing them with placeholder pages. iText can use existing PDF files or create blank pages.
From my personal experience, I have just dropped in images and text onto existing pages and added pages generated by a reporting tool when tabular data is required.
HTH
Why need the template files in the pdf format? With the dynamic that you want it is a bad format for reporting template. Use a reporting software that also output as pdf. Your designers can create the templates very easy with a GUI. You will find many reporting solutions here on stackoverflow.
We've used http://www.dynamicpdf.com/, for projects like this and have been pleased with the outcome. They have a PDF WYSIWYG designer tool that you can use to build out the layout template of your PDF, along with the tool to merge the template with your data based on field to column mapping. This saves hours of hand coding each element of a given PDF. It supports page headers and footers and deals with the complexities of page breaks.
I would consider using creating HTML files from templates and turn them into PDF with Prince. Prince is very flexible and crating HTML files from templates is much easier then creating PDF files directly.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have an Excel application with one entry page and one result page. Based on the data entered into the entry page and some static tables present, there are some other worksheets which calculate the values and sum it up later in the results page.
I need to convert this application to a web application and move the whole logic from excel to C#. There will be some UI pages and 1 results page, and the worksheet calculations currently in the excel application will be moved to different C# classes.
Is there any tool for this, or do we have to manually convert every piece of the existing logic to C#? If we have to do it manually, then is there any tool which can help summarize the logic in a text file, so that we don't have to backtrack the whole calculation in Excel to reconstruct the logic?
I doubt a tool could help you do this, and if that tool existed I would expect the resulting C# code to be pretty bad. One of the forces of Excel is that it permits a very flexible programming style; the downside is that the resulting code has very little structure to it, which makes it hard to follow, both for humans and machines. On top of that, the logic can take multiple forms, from an Excel formula to a VBA macro, which complicates matters.
By contrast, C# tends to see the world in terms of classes, which have very specific responsibilities; a program essentially coordinates passing messages between classes so that they can "talk to each other" and collaborate to get the job done.
In that context, at best, I would expect a translation tool to produce a few unpleasant C# procedures. In the end, a spreadsheet (with no VBA) is a set of functions sitting in cells and chained together, there isn't enough structure present to extract meaningful classes/entities that own pieces of functionality.
Furthermore, I would argue that re-thinking this application is an opportunity. A web app can do easily things Excel can't, and vice-versa. Instead of a "word-for-word" close translation, I would focus on retaining the spirit of the application, but design it without thinking too much about the original Excel application.
The benefit of having the Excel application present is that you have a proof of concept already. Instead of trying to convert the code, I would simply track down all the input points that have an influence on other calculated fields (maybe using auditing), list / diagram what influences what (maybe with a simple bubbles- and-arrows diagrams), and attempt to describe in plain English what the user is attempting to do, and what is happening in terms of "entities". For instance, rather than =A2*(1-A3), I would say "the Product Net Cost is its Cost time (1 - Discount Rate)". And instead of =SUM(A5:A32), I would say "the User wants a summary of the cost of the Products he ordered, which should display the Total Cost of his Order, so that he can have an overview of his order". If you manage to extract a description of your domain with good names and use cases, this will be more helpful to a developer to write the best possible application supporting these requirements, on any platform you may want.
I wouldn't try to "convert" the solution. Build it in C# from scratch. They are written in different languages and the UI cannot be reused.
I agree with fabiopagoti. Even if a magical 'converter' would be available, what human would dare to maintain this coded-by-machine application in future updates?
Taking your second question, looking for an alternative helper, I've found the [CTRL]+[`] key combination that may give you a hand on this: This shortcut displays all formulas in cells, so you can be more confortable you're not missing anything behind. I hope.
PS: To text file? Just apply the shortcut and "print" the sheet to a PDF file.
May be you can use Google Docs. It allow to create scripts and use formulas.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 12 months ago.
Improve this question
For my Microsoft Surface application I'd like to generate a PDF including some images. How should I start to do that?
EDIT:
I don't want to convert a text or html document, but I want to create a pdf from scratch.
PDFsharp is nice, free, relatively easy to use and compatible with WPF
Two basic options:
use a PDF specific library, like iText in #Kent's answer
install a PDF printer and use any reporting/printing code.
Option 1) will be the most flexible and efficient way.
Option 2) is interesting when you already have code that prints what you want. It is a bit of a hassle to manage the output-file etc. It's very easy to support XPS as well.
Docotic.Pdf library may be used to create PDF files from scratch and many other purposes.
Please take a look at samples for various PDF tasks.
The library has no external dependencies and is written in C#. There are several different license types available.
Disclaimer: I work for Bit Miracle.
For me the easiest way is to use Reporting Services.
http://www.microsoft.com/sqlserver/2008/en/us/reporting.aspx
You should also have a look at Fop/PdfBox for PDF creation/editing.
you need to use one of PDF generation libraries for C#. I tried to use iText , IronPDF and PDFFlow. All of them create PDF documents from scratch.
But PDFFlow was better for my needs, because it has easy fluent syntax and more functionality (I needed repeating headers, automatic page creation, automatic page numeration and multi-page spread table). They also have examples of business documents here: Examples, I haven't seen that much from other PDF generating libraries. There are also how-to-build articles for each sample, that helped me much.
This is how to create simple PDF file:
var DocumentBuilder.New()
.AddSection()
.AddParagraphToSection("Hello world!")
.AddImage("image.png").SetWidth(250)
.ToSection()
.AddLine(300, Color.Red)
.ToDocument()
.Build("Result.PDF");
Hope, this will help you to start.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have been tasked with evaluating our current system that we use for creating and outputing reports.
Currently we are using Crystal Reports 2008, (I know that this is and old version.), which has a custom commandline app that we wrote in C# to execute the report for a given parameter passed through the command line.
We like Crystal becuase it's easy to setup and design the report. It's also easy to print and create a PDF file from crystal using our custom commandline program.
One of the problems/complaints that we have is that Crystal does not appear to have a method that will allow us to create a PDF file with a vector images, such as our company logo. Crystal Reports always converts an image into a bitmap. When the PDF is printed, the results are less than flattering, and the PDF file size is increased.
Does anyone have any recomendadtions for a reporting product that we should consider?
iTextSharp supports importing WMF as vector image. Maybe other formats too.
See sample here. N.B.: it seems, it's a bit outdated... you'll need to replace 'getInstance' with 'GetInstance'.
www.hagridsolutions.com/xtraction
Offers easier use than Crystal and a rich export that can cater for exporting data into a MS Word template (that could contain vector images, headers, table of contents) and also export this into PDF or HTML format.
Design is drag-and-drop with no coding or dependence on specialized staff whatsoever.
You can define the reports once and have them scheduled to output to PDF, saved to the system to be viewed online or to a file system.
The dates can be rolling (as in Last Week, Last Month) and so always deliver based on what you need.
The design is drag-and-drop, the dashboards are interactive, the reports are available when you need them, and there is security to lock down access to the dashboards/reports and control of who can design dashboards/reports. The flexibility is surely there for whatever combination is needed.
I think that Combit's List and LAbel will fit this requirement.
www.combit.de
however the support for EMFs is not perfect, it works good for small and medium complexity.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed last month.
Improve this question
I used LaTeX for writing couple of white papers while I was in grad school. From that I have a really good impression about it in terms of what LaTeX allows user to do, especially the fine control it provides over formatting, etc.
Now I am debating whether I should actually use LaTeX for our production apps to generate PDFs. I have tried several commercial and free PDF libraries (PDFSharp/MigraDoc, iTextSharp, Expert PDF, etc) and none of them provide the amount of fine control I need without making the code base extremely difficult to maintain in long run.
If I do decide to go this route it will be done from C#. Since LaTeX already has command line interface, I should be able to do that pretty easily from C# as an external process forked from my program.
But I was looking for some comments from the community.
Has anyone tried it? If so, what were some gotchas?
What do you think about the idea -- pros and cons (I am more interested in gotachas)?
All feedback is welcome.
I have previously built a platform for report generation that uses plain TeX (specifically the MiKTeX implementation) to generate reports in PDF format. The platform is used to generate approximately fifty reports per month of varying nature (containing mostly dynamically generated tables and charts). The system is quite flexible. Reports are defined via XML (on an internally-defined report description schema). The platform allows the user to specify a source database table, which fields to extract, the formatting of the fields, a mini query language to filter the appropriate data as well as various formatting elements (page orientation, size, titles, and classifications ("Public", "Internal", "Confidential", etc.).
The main "gotcha" is that it takes a ton of work to end up with a code base that is flexible to change and not a total pain to maintain. A second "gotcha" is that knowledge of TeX (outside of academics) is rare so you could end up becoming the de facto maintainer even if that is not part of your usual role.
Pros:
Beautifully formatted reports.
Complete control over layout and look.
Free.
Cons:
Difficult to implement properly.
Difficult to maintain.
Knowledge transition could be burdensome.
Support is effectively non-existant.
I've done a few in-house "production level" documents in LaTeX.
Generating LaTeX documents in Windows is an overall horrible experience, to be honest. I was never able to find any solution besides Cygwin. Once you've got the Cygwin environment up and running, it was as simple as picking out the LaTeX and related libraries from Cygwin's setup.exe.
I haven't tried running LaTeX from a non-Cygwin environment, but in theory you should be able to just run C:\Cygwin\usr\bin\latex.exe -- then there's a chance it will be missing paths since you're not in Bash, in which case you might need to just pass the include directories to subsequent programs.
If you decide to use Docbook instead of LaTeX for your documentation (and I would recommend at least giving it a look, it's much more structured for software-related technical documentaion), I had good experience running dblatex under Cygwin. It's not in the Cygwin repositories, but it's a piece of cake to install from source.
I have done various production PDF implementations using TeX. I ended up abandoning LaTeX, and went with ConTeXt (see also Context Garden).
There is a very active mailing list, it is used extensively for document production, and there is a nice minimal distribution for various Unixes, Windows and Mac OS X. There is no need for Cygwin on Windows (although you do need Ruby).
I find ConTeXt approach to TeX cleaner than LaTeX, but that might just be me.
If you need to publish data summaries and graph, then you can have a look at Sweave. Sweave allows to mix all the functionality of R with TeX. The source code of a report consist of a plain TeX file with some R-code chunk were you need to read, manipulate, tabulate or plot data. Then you 'compile' the Sweave file (from the commandline) which returns a plain TeX file.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Is there a library that has a class to extract the text from a pdf file in c#.net? I've tried a few but documentation is terrible, so I haven't been able to get it off the ground. Also if it provides a class to extract images that would be a plus. Any suggestions? Thx in advance.
Also I need to be able to implement it into an existing application.
Have you tried PDFKit.NET? It has reasonable docs and some good examples. It is designed for a server environment, so it is a little expensive.
EDIT Here is an open source library on SourceForge called iTextSharp. It is free for open source projects. I haven't used it, but it looks promising. Here is a tutorial for it that has lots of code examples.
There are a couple of ways you can go here -- a lot of it will depend on whether you want to retain the formattting (i.e., paragraphs and other layout elements) of the original PDF.
If you're considering commercial solutions, we do offer two products that might meet your requirements. One is EasyPDF SDK which has single shot ExtractText() and ExtractText2() calls that pull text out of your PDFs as plain text.
Note that the output from these calls is pretty simplistic and you will lose a lot of the original layout elements. They're nice for simple text extraction but might not be great if your PDF contains tabular data.
If you're dealing with tables, a nicer alternative could be to pull it out as rich text instead. We a have a tool called EasyConverter SDK geared for business documents which does just that using a single function call.
With EasyConverter SDK, the layout of your original PDF will be retained.
Both support C# so feel free to check out the eval versions at www.pdfonline.com if you're interested. I do work for the vendor so do take this suggestion as kind of a mother loving her own child :-) I've been browsing stackoverflow.com for code snippets for a long time, but have only recently started posting, so if you have any questions with either API just let me know and I can help. Cheers!
Docotic.Pdf library can extract text and images from PDF files.
You can extract text from whole document of from some pages only. The library can extract plain text and also text chunks with coordinates.
You can extracted images from PDFs (as JPEG and TIFF files).
Here is a couple of samples for your task:
Extract text from PDFs
Extract images from a PDF
Disclaimer: I work for Bit Miracle, vendor of the library.
we've used snowbound software at work for image conversion. it apparently supports text extraction too. however, it's not free.