Just wondering if anyone could tell me of a simple way to create files for printing? At the moment I'm just scripting HTML, but I'm wondering if there isn't some easier way of doing it that would give me more control over what it being printed? Something along the lines of an Access printout, or Excel printout - where I could decide how to lay things out and almost "Mail merge" the details in via programming.
Basically, I want to create something for print that can have tables encasing it, and could be longer or shorter for each record depending upon the number of foreign keys (e.g. one staff member could have 10 jobs today, or just 3. I want to create a document that will generate and print).
Any ideas/advice/opinions? Thank you!
EDIT: Wow, thanks for all the responses! For this particular task, FlowDocuments seems to be the closest to what I'm actually after so I'll play with that. Either way I have several really good options now.
EDIT 2: After some playing, iTextSharp has become the choice for me. For anyone wondering in the future, here is a link to a great and simple tutorial: http://www.mikesdotnetting.com/Category/20
Thanks again!
I would create a PDF file which can be viewed just about anywhere and will maintain formatting. Take a look here: http://itextsharp.sourceforge.net/
There's always FlowDocuments. Check out the overview at MSDN
http://msdn.microsoft.com/en-us/library/aa970909.aspx and see if they match what you want to do. They're pretty easy to print and can be serialized to xaml. Might not be exactly what you're after, but they're pretty useful.
We are currently using PDFSharp with great success -
http://www.pdfsharp.com/PDFsharp/
GDI+ or WPF ... all .NET, not COM or interop.
Oh, and its open source. Here is some sample code -
http://www.pdfsharp.net/wiki/PDFsharpSamples.ashx
If you use PDF or XPS generator, it still requires you to define the document composition very much like scripting your HTML, so I dont see that it gives you much more values other than the created file is in print ready format.
What you need is something that you can design a template and just filling in the blank, so I suggest that you either go for Word or Excel automation, otherwise look at some lightweight report generation library. I come across this and maybe it is worth checking out too.
http://www.fyireporting.com/
Like David i also recommended I Text Sharp ;) It's relly easy to create pdf document with this ;) I use it in ASP.NET project. It have much of options to format pdf file, in my example i use basic ;)
Example:
string file = #"d:\print.pdf"; //path to pdf file
Document myDocument = new Document(PageSize.A4.Rotate());
PdfWriter.GetInstance(myDocument, new FileStream(file, FileMode.Create));
myDocument.Open();
//data to save in pdf- unimportant!
Opiekun obiekun = (from opiekunTmp in db.Opiekuns where opiekunTmp.idOpiekun == nalez.Dziecko.idOpiekun select opiekunTmp).SingleOrDefault();
Dziecko dzieckoZap = (from dzieckoTmp in db.Dzieckos where dzieckoTmp.idDziecko == nalez.idDziecko select dzieckoTmp).SingleOrDefault();
//some info about font
BaseFont times = BaseFont.CreateFont(BaseFont.TIMES_ROMAN, BaseFont.CP1250, BaseFont.EMBEDDED);
Font font = new Font(times, 12);
myDocument.Add(new Paragraph("--------------------------Raport opłaty--------------------------",font));
myDocument.Add(new Paragraph("Data rozliczenia: " + (((TextBox)this.GridViewOplaty.Rows[e.RowIndex].Cells[8].Controls[0]).Text), font));
myDocument.Add(new Paragraph("Płatnik: " + obiekun.Imie + " " + obiekun.Nazwisko, font));
myDocument.Add(new Paragraph("Dziecko: " + dzieckoZap.Imie + " " + dzieckoZap.Nazwisko, font));
myDocument.Add(new Paragraph(""));
myDocument.Add(new Paragraph("Data Podpis płatnika: " + obiekun.Imie + " " + obiekun.Nazwisko, font));
myDocument.Add(new Paragraph(""));
myDocument.Add(new Paragraph(" ........... ................................."));
myDocument.Close(); //we close the pdf and open
System.Diagnostics.Process.Start(file); //and open our file if You want that ;)
I have created printouts from web sites using Excel XML format. Basically this means you don't have to use the Office APIs to generate the document (which can be cumbersome and requires extra libraries on the web server). Instead, you can just take an XML template, use XPath, LINQ to XML, or other technologies to insert your data into the template, and then stream it to the user and they can print it.
Generating the template is easy. You just use Excel to create the document and then save it in the "XML Spreadsheet" format. The XML is a bit oppressive but it isn't terrible.
Documentation on the XML Spreadsheet format is here:
http://msdn.microsoft.com/en-us/library/aa140066%28office.10%29.aspx
Note that the documentation is for Excel 2002. The format does change in newer versions of Excel, but it is backwards compatible.
We use ActiveReports. It is very easy to define your layout and can print and export in a number of formats, pdf, rtf, excel, tiff, etc.
If you're looking for a very simple way to create docs (prob not the best, but it sure is easy), you can set up Word docs with bookmarks and insert data into the bookmarks through code, so I'm guessing this would work for Excel too (if they have bookmarks?):
EDIT: Here's a quick translation into c# (been tested in vb, but not c#):
Word.Application oWord = default(Word.Application);
Word.Document oDoc = default(Word.Document);
oWord = Interaction.CreateObject("Word.Application");
oWord.Visible = false;
oDoc = oWord.Documents.Add(Directory + "\\MyDocument.dot");
oDoc.Bookmarks("MyBookmark").Range.Text = strBookmark;
oDoc.PrintOut();
oDoc.Close(Word.WdSaveOptions.wdDoNotSaveChanges);
oDoc = null;
oWord.Application.Quit();
oWord = null;
I've done the word and pdf generation things in the past and the support for pdf generation in pdfsharp (#Kris) is pretty good and I would use it ahead of office automation.
I hope I've not mis-read your needs but rather than exporting in a specific format and then firing the print feature I would these days re-consider plain old browser printing. In the past the awful limitations of browser printing meant that printing was a bad experience (no shrink-to-fit etc). But modern browsers have sufficient print support to be acceptable for simple jobs.
I've just checked printing this page in Firefox 3.5.8 and IE8 and both support shrink-to-fit and I reckon a simple print stylesheet will generate nice looking job sheets straight out of the browser as long as your (presumably internal) audience is guaranteed to have a modern browser.
XPS is MSFT's solution to print structured and formatted documents. I'm not saying to send someone an XPS, just to use the .NET support for printing via the XPS framework.
http://roecode.wordpress.com/2007/12/21/using-flowdocument-xaml-to-print-xps-documents/
It's very easy to do if you're doing WPF. Essentially an XPS is MSFT's PDF. Anyone running Vista or 7 can view/print them fine.
Related
I try to build an application that can convert a PDF to an excel with C#.
I have searched for some library to help me with this, but most of them are commercially licensed, so I ended up to iTextSharp.dll
It's good that is free, but I rarely find any good open source documentation for it.
These are some link that I have read:
https://yoda.entelect.co.za/view/9902/extracting-data-from-pdf-files
https://www.mikesdotnetting.com/article/80/create-pdfs-in-asp-net-getting-started-with-itextsharp
http://www.thedevelopertips.com/DotNet/ASPDotNet/Read-PDF-and-Convert-to-Stream.aspx?id=34
there're more. But, most of them did not really explain what use of the code.
So this is most common code in IText with C#:
StringBuilder text = new StringBuilder(); // my new file that will have pdf content?
PdfReader pdfReader = new PdfReader(myPath); // This maybe how IText read the pdf?
for (int page = 1; page <= pdfReader.NumberOfPages; page++) // looping for read all content in pdf?
{
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy(); // ?
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy); // ?
currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(currentText))); // maybe how IText convert the data to text?
text.Append(currentText); // maybe the full content?
}
pdfReader.Close(); // to close the PdfReader?
As you can see, I still do not have a clear knowledge of the IText code that I have. Tell me, if my knowledge is correct and give me an answer for code that I still not understand.
Thank You.
Let me start by explaining a bit about PDF.
PDF is not a 'what you see is what you get'-format.
Internally, PDF is more like a file containing instructions for rendering software. Unless you are working with a tagged PDF file, a PDF document does not naturally have a concept of 'paragraph' or 'table'.
If you open a PDF in notepad for instance, you might see something like
7 0 obj
<</BaseFont/Helvetica-Oblique/Encoding/WinAnsiEncoding/Subtype/Type1/Type/Font>>
endobj
Instructions in the document get gathered into 'objects' and objects are numbered, and can be cross-referenced.
As Bruno already indicated in the comments, this means that finding out what a table is, or what the content of a table is, can be really hard.
The PDF document itself can only tell you things like:
object 8 is a line from [50, 100] to [150, 100]
object 125 is a piece of text, in font Helvetica, at position [50, 110]
With the iText core library you can
get all of these objects (which iText calls PathRenderInfo, TextRenderInfo and ImageRenderInfo objects)
get the graphics state when the object was rendered (which font, font-size, color, etc)
This can allow you to write your own parsing logic.
For instance:
gather all the PathRenderInfo objects
remove everything that is not a perfect horizontal or vertical line
make clusters of everything that intersects at 90 degree angles
if a cluster contains more than a given threshold of lines, consider it a table
Luckily, the pdf2Data solution (an iText add-on) already does that kind of thing for you.
For more information go to http://pdf2data.online/
I am trying here to create a piece of code which would open an existing PDF Form (previously created with Open Office) with empty controls and set the values using iTextSharp (). I am still at the testing of iTextSharp to see if it does whatever I need to do, and so far the answer is no.
Please see what I've done below according to what I found on the net:
string fileNameExisting = #"PdfTemplate.pdf";
string fileNameNew = #"new.pdf";
using (var existingFileStream = new FileStream(fileNameExisting, FileMode.Open))
using (var newFileStream = new FileStream(fileNameNew, FileMode.Create))
{
// Open existing PDF
var pdfReader = new PdfReader(existingFileStream);
// PdfStamper, which will create
var stamper = new PdfStamper(pdfReader, newFileStream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
foreach (string fieldKey in fieldKeys)
{
bool result = form.SetField(fieldKey, "A lot of text here.");
}
stamper.Close();
pdfReader.Close();
}
Issue 1
iTextSharp only recognizes 'controls' elements from Open Office (Textboxes for example). I tried to add a table to the PDF Template but it doesn't appear in the fields. Which means I am really limited in what to use.
Issue 2
When I set the text in the fields, there is no wraping of the text, and the size of the controls is not dynamic which means if the text is too long, it doesn't all appear. I can't use the scroll bar as the PDF is to print.
I tried
For the first issue, I created a PDF Form with Word instead of Open Office Writer. However, iTextSharp does not recognize any of the controls from Word, my fields collection is empty..
For the second issue, I tried to modify every properties of the controls in Open Office, looked on the internet to see if someone had a solution. But from what I understood, the size is fixed as it is AcroFields, so I can't make the control dynamic and can't change the size afterwards with iTextSharp.
I was hoping someone went through the same situation and would be able to guide me either with iTextSharp, or another library, free or not. I can't afford a £2000 license though as I am running my own business, but I am open to suggestions as I need to deliver.
The last option is to create the PDF from scratch with iTextSharp, but it's not as fast and easy to produce as the modification, and it means that for every update of the PDF, the company would need me to change the code... I'm not very pleased with that solution.
Issue 1:
A table is not a form field. Please read the PDF specification, more specifically ISO-32000-1:
There is no such thing as a dynamic table in PDF. That is only possible in XFA (which is XML wrapped in a PDF file), but XFA is being deprecated. At iText, we'll release a (closed source) product in February 2017 for dynamic documents.
Issue 2:
The text only wraps if the field is defined as a multi-line text field. See for instance https://developers.itextpdf.com/question/how-get-row-count-multiline-field
The font size only adapts to the size of the field if you set the font size to 0: Set AcroField Text Size to Auto
Summarized:
Dynamic forms and PDF either require XFA in which case you need to buy Adobe LiveCycle ES (which is way above your budget), or you need to wait until iText Group releases its dynamics forms project (but that will also be more expensive than £2000).
I am generating PDF using iTextSharp.dll, but the problem is that I am not able to apply that CSS. I have one div:
<div id="personal" class="headerdiv">
Personal Data
</div>
now my .aspx.cs code is like this:
iTextSharp.text.html.simpleparser.StyleSheet styles = new iTextSharp.text.html.simpleparser.StyleSheet();
styles.LoadTagStyle("#headerdiv", "height", "30px");
styles.LoadTagStyle("#headerdiv", "font-weight", "bold");
styles.LoadTagStyle("#headerdiv", "font-family", "Cambria");
styles.LoadTagStyle("#headerdiv", "font-size", "20px");
styles.LoadTagStyle("#headerdiv", "background-color", "Blue");
styles.LoadTagStyle("#headerdiv", "color", "White");
styles.LoadTagStyle("#headerdiv", "padding-left", "5px");
HTMLWorker worker = new HTMLWorker(document);
worker.SetStyleSheet(styles);
// step 4: we open document and start the worker on the document
document.Open();
worker.StartDocument();
// step 5: parse the html into the document
worker.Parse(reader);
// step 6: close the document and the worker
worker.EndDocument();
worker.Close();
document.Close();
There's a couple of things going on here. First and foremost, the HTML/CSS parser in iText and iTextSharp are far from complete. They're definitely very powerful but still have a ways to go. Each version gets better so always make sure that you're using the latest version.
Second, I've seen more HTML/CSS activity in an add-on for iText/iTextSharp called XMLWorker that you might want to look at. You don't "load styles" anymore, you just pass raw HTML/CSS in and it figures out a lot of things. You can see some examples here, see a list of supported CSS attributes here, download it here (and get the two missing files here and here).
Third, LoadTagStyle is for loading style attributes for HTML tags, not CSS IDs or Classes. You want to use LoadStyle to load by class:
styles.LoadStyle("<classname>", "<attribute>", "<value>");
Unfortunately this method still doesn't do what you want it to do always. For instance, to change the font size you'd think you'd say:
styles.LoadStyle("headerdiv", "font-size", "60ptx);
But to get it to work you can only use relative HTML font sizes (1,2,-1, etc) or PT sizes and you must use the size property:
styles.LoadStyle("headerdiv", "size", "60pt");
//or
styles.LoadStyle("headerdiv", "size", "2");
The LoadStyle honestly feels like an afterthought that was only partially completed and I recommend not using it actually. Instead I recommend writing the style attributes directly inline if you can:
string html = "<div id=\"personal\" class=\"headerdiv\" style=\"padding-left:50px;font-size:60pt;font-family:Cambria;font-weight:700;\">Personal Data</div>";
Obviously this defeats the points of CSS and once again, that's why they're working on the new XMLWorker above.
Lastly, to use fonts by name you have to register them with iTextSharp first, it won't go looking for them:
iTextSharp.text.FontFactory.Register(#"c:\windows\fonts\cambria.ttc", "Cambria");
In case someone is still having issues with this. The latest version of itextsharp (currently 5.3.2) significantly improves the HTMLWorker processor.
you can get it here: http://sourceforge.net/projects/itextsharp/
The correct way to reference the backgroud color is through the HtmlTags class
styles.LoadTagStyle(HtmlTags.HEADERCELL, HtmlTags.BACKGROUNDCOLOR, "Blue");
I have a pdf with a form in it. I am trying to write a class that will take data from my database and automatically populate the fields in the form.
I have already tried ITextSharp and their pricing is out of my budget, even though it works perfectly fine with my pdf. I need a free pdf parser that will let me import the pdf, set the data, and save the PDF out, preferably to a stream so that I can return a Stream object from my class rather than saving the pdf to the server.
I found this pdf reader and it doesn't work. Null reference errors are abundant and when I tried to "fix" them, it still couldn't find my fields.
So, I have moved on to PdfBox, as the documentation says it can manipulate a PDF, however, I cannot find any examples. Here is the code I have so far.
var document = PDDocument.load(inputPdf);
var catalog = document.getDocumentCatalog();
var form = catalog.getAcroForm();
form.getField("MY_FIELD").setValue("Test Value");
document.save("some location on my hard drive");
document.close();
The problem is that catalog.getAcroForm() is returning a null, so I can't access the fields. Does anyone know how I can use PdfBox to alter the field values and save the thing back out?
EDIT:
I did find this example, which is pretty much what I am doing. It's just that my acroform is null in pdfbox. I know there is one there because itextsharp can pull it out just fine.
Have you tried with the 1.2.1 version?
http://pdfbox.apache.org/apidocs/overview-summary.html
I have spent about 20 hours of coding to produce invoices using iText in c#.
Now, i want to use the same code to transform some of the tables to html.
Do you know if i can do this?
For instance i have this:
PdfPTable table = new PdfPTable(3);
table.DefaultCell.Border = 0;
table.DefaultCell.Padding = 3;
table.WidthPercentage = 100;
int[] widths = { 100, 200, 100};
table.SetWidths(widths);
List listOfCompanyData = (List)getCompanyData();
List listOfCumparatorDreaptaData = (List)getCumparatorDreaptaData(proformaInvoice.getCumparatorDreapta());
table.AddCell((Phrase)listOfCompanyData.Items[0]);
table.AddCell("");
table.AddCell((Phrase)listOfCumparatorDreaptaData.Items[0]);
and i want to transform this table into html...
Is it possible?
PDFs and HTML are fundamentally different display technologies. PDF is much more complex then HTML is, which is why you find so many HTML to PDF converters. The other way around is much more difficult.
iText can only do do it from HTML to PDF.
There are online converters that will take a PDF and convert it to HTML. There are also downloadable utilities.
I am not aware of any .NET library that will do this.
PDF is almost a write-only format. Any time your workflow calls for "get the data out of a PDF", you've probably screwed up.
Having said that, there are several ways to stash data within a PDF:
Form fields have no particular length limit and need not be visible. Getting form data with iText is trivial.
You can attach a file to a PDF and suck it out later, both with iText.
DocInfo fields. You can stuff a string into one of the author/title/keywords/etc metadata fields. An ugly hack, but effective.
XML metadata. The "new-fangled" metadata is stored in an XML schema. You can put pretty much whatever you want in there... though iText regenerates some of it every time it makes changes (mod date and such).
Custom keys/values. You can tack any old key/value pairs you like into any old dictionary within a PDF. Adobe would like you to register a company-specific prefix for your custom tags to avoid collisions, but I've never felt the need.
From the book iText in Action it seems that it is doable using the original java library, but it does not seem like it is no longer ported in the c# lib. I'm pretty sure it was in version 4 :-/
Try look at some old source here: http://www.koders.com/csharp/fid60B0985D3A89152128B73F54EDD4EB5420A5E4D8.aspx?s=%22Ken+Auer%22
nFOP + XSLT + XML = pdf | doc | HTML
nfop.sourceforge.net/article.html should give you an idea on how to use it, you need "Microsoft Visual J # NET Redistributable Package" to run nFOP
open source no cost :)
K