How to read PDF form data using iTextSharp?

How to read PDF form data using iTextSharp? - c#

I am trying to find out if it is possible to read PDF Form data (Forms filled in and saved with the form) using iTextSharp. How can I do this?

You would have to find out the field names in the PDF form. Get the fields and then read their value.
string pdfTemplate = "my.pdf";
PdfReader pdfReader = new PdfReader(pdfTemplate);
AcroFields fields = pdfReader.AcroFields.Fields;
string val = fields.GetField("fieldname");
Obviously in the code above, field name is the name of the PDF form field and the GetField method returns a string representation of that value.
Here is an article with example code that you could probably use. It shows how you can both read and write form fields using iTextSharp.

Maybe the iTextSharp library has changed recently but I wasn't able to get the accepted answer to work. Here is my solution:
var pdf_filename = "pdf2read.pdf";
using (var reader = new PdfReader(pdf_filename))
{
var fields = reader.AcroFields.Fields;
foreach (var key in fields.Keys)
{
var value = reader.AcroFields.GetField(key);
Console.WriteLine(key + " : " + value);
}
}
A very subtle difference, due to reader.AcroFields.Fields returning an IDictionary instead of just an AcroFields object.

If you are using Powershell, the discovery code for fields is:
Add-Type -Path C:\Users\Micah\Desktop\PDF_Test\itextsharp.dll
$MyPDF = "C:\Users\Micah\Desktop\PDF_Test\something_important.pdf"
$PDFDoc = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $MyPDF
$PDFDoc.AcroFields.Fields
That code will give you the names of all the fields on the PDF Document, "something_important.pdf".
This is how you access each field once you know the name of the field:
$PDFDoc.AcroFields.GetField("Name of the field here")

This worked for me!
Note the parameters when defining stamper! '\0', true
string TempFilename = Path.GetTempFileName();
PdfReader pdfReader = new PdfReader(FileName);
//PdfStamper stamper = new PdfStamper(pdfReader, new FileStream(TempFilename, FileMode.Create));
PdfStamper stamper = new PdfStamper(pdfReader, new FileStream(TempFilename, FileMode.Create), '\0', true);
AcroFields fields = stamper.AcroFields;
AcroFields pdfFormFields = pdfReader.AcroFields;
foreach (KeyValuePair<string, AcroFields.Item> kvp in fields.Fields)
{
string FieldValue = GetXMLNode(XMLFile, kvp.Key);
if (FieldValue != "")
{
fields.SetField(kvp.Key, FieldValue);
}
}
stamper.FormFlattening = false;
stamper.Close();
pdfReader.Close()

The PDF name is "report.pdf"..
The data field to be read into TextBox1 is "TextField25" in the PDF..
Dim pdf As String = "report.pdf"
Dim reader As New PdfReader(pdf)
Dim fields As AcroFields = reader.AcroFields
TextBox1.Text = fields.GetField("TextField25")
Important Note: This can be used ONLY IF the PDF is not flattened (means the fields should be editable) while it was created using iTextSharp..
i.e.
pdfStamper.FormFlattening = False
This is very simple.. And it works like a charm.. :)

If anybody is still wondering about this answer, this is how I extracted the text in the field (provided you know the field name):
PdfReader reader = new("filepath");
PdfDocument doc = new(reader);
PdfAcroForm form = PdfAcroForm.GetAcroForm(document, false);
Form.GetField("FieldNameHere").GetValueAsString();
Works for iText 7.1.16

Related

Itext 7 - PdfReader is not opened with owner password Error

I am using This example for the latest Itext7 to fill in a document and I am getting this error: iText.Kernel.Crypto.BadPasswordException: PdfReader is not opened with owner password
So I looked around the net I found that some people found solution to this error using PdfReader.unethicalreading = true; but when I try to use this same code it says there is no definition in PDFReader named unethicalreading
Here is the Code I have:
string src = #"C:\test1.pdf";
string dest = #"C:\Test2.pdf";
PdfDocument pdfDoc = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdfDoc, true);
IDictionary<String, PdfFormField> fields = form.GetFormFields();
PdfFormField toSet;
fields.TryGetValue("Name", out toSet);
toSet.SetValue("Some text");

You need to change your code like this:
string src = #"C:\test1.pdf";
string dest = #"C:\Test2.pdf";
PdfReader reader = new PdfReader(src);
reader.setUnethicalReading(true);
PdfDocument pdfDoc = new PdfDocument(reader, new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdfDoc, true);
IDictionary<String, PdfFormField> fields = form.GetFormFields();
PdfFormField toSet;
fields.TryGetValue("Name", out toSet);
toSet.SetValue("Some text");
This will allow you to go against the permissions that were defined by the original author of the document. This also proves that setting such permissions has become obsolete, because since PDF became an ISO standard, there is no longer a penalty for removing those permissions.

AcroForm PDF to normal PDF in c#

I have an Acroform PDF (a PDF which can be edited) but I'm using an API to sign the PDF which requires that the PDF is a normal one and never an Acroform one.
Is there any way to transform an AcroForm PDF to a normal one?
I tried making it Read-Only but even though it cannot be edited it still is an Acroform PDF.

In answer to my comment, I assume you are using iTextSharp, even though you do not specify. Using iTextSharp, I believe you need to Flatten the form when you are done. Here is a simple example:
public void GeneratePDF(string filePath, List<PDFField> modifiedFields)
{
var pdfReader = new PdfReader(filePath);
var folderStructure = filePath.Split('\\');
if (folderStructure.Length == 0) return;
var currentFileName = folderStructure.Last();
var newFilePath = string.Format("{0}{1}", Constants.SaveFormsPath,
currentFileName.Replace(".pdf", DateTime.Now.ToString("MMddyyhhmmss") + ".pdf"));
var pdfStamper = new PdfStamper(pdfReader, new FileStream(newFilePath, FileMode.Create));
foreach (var field in modifiedFields.Where(f=>f.Value != null))
{
pdfStamper.AcroFields.SetField(field.Name, field.Value);
}
pdfStamper.FormFlattening = true;
pdfStamper.Close();
}
Ignoring the parts about the filename, it boils down to passing in some key value list regarding the field values to set. This could be where you do your signature piece, and then setting the FormFlattening property on the stamper to true.
Here is another SO post where they used a similiar technique for a slightly different issue, it may be of help: How to flatten already filled out PDF form using iTextSharp

Add new page to iTextSharp's PdfStamper with template and fields

I'm trying to add new page to a PdfStamper but this code doesn't add the template pdf fields to the stamper.
private void InsertNewPage(PdfStamper stamper, int pageNumber)
{
var pdfReader = new PdfReader(UrlTemplateBlankPage);
pdfReader.SelectPages("1");
stamper.InsertPage(pageNumber, pdfReader.GetPageSize(1));
stamper.GetOverContent(pageNumber).AddTemplate(stamper.GetImportedPage(pdfReader, 1), 0, 0);
//This code doesn't work because the code before is not adding the form
var pdfFormFields = stamper.AcroFields;
var fieldKeys = pdfReader.AcroFields.Fields.Keys;
foreach (var k in fieldKeys.ToList())
{
pdfFormFields.RenameField(k, k + string.Format("_{0:000}", pageNumber));
}
}
I searched online but I can't find an answer about my problem.
The PDF template I'm adding has some fields added with Acrobat. I can't attach the template but I can give you all informations.

I cannot see how do you instantiate the stamper. This is an example about how to read a PDF template and assign it to the stamper:
var reader = new PdfReader(TEMPLATE_PATH);
var pdfOutput = new FileStream(PDF_OUTPUT_PATH, FileMode.Create)
var stamper = new PdfStamper(reader, pdfOutput);
After that, you can set the fields using the SetField function:
stamper.AcroFields.SetField("FIELD1", "VALUE")
There is an option to make your fillable PDFs non-editable using:
stamper.FormFlattening = true;
Otherwise, your PDF still fillable.
Once you have finished working with your files, close them:
stamper.Close();
reader.Close();
There is an example about how to use iTextSharp's PdfStamper in the next link: http://www.codeproject.com/Tips/679606/Filling-PDF-Form-using-iText-PDF-Library

Reading text against the editable field in PDF to UI

In my website, a client will upload different types of PDF templates which contain editable fields. I want to read the text and editable fields from the PDF and display the text with corresponding fields in my web form. I have found solutions for reading the text and fields separately, but I am not able to map the fields against the corresponding text.
Reading text and getting fields using itextsharp but not able to map that text and field. for ex: in the pdf it is specified as FirstName: Thomas. Reading from the PDF and display it in the UI as Firstname(label): Thomas (textbox).
sample code i have used to get all the fields,
public string GetPDFFields()
{
string pdfTemplate = #"d:\1234.pdf";
var pdfReader = new PdfReader(pdfTemplate);
var outStream = new MemoryStream();
var stamper = new PdfStamper(pdfReader, outStream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
StringBuilder sb = new StringBuilder();
foreach (string fieldKey in fieldKeys)
{
sb.Append(form.GetField(fieldKey)+"\r\n");
}
return sb.ToString();
}

Try to read use this code sample
http://simpledotnetsolutions.wordpress.com/2012/04/08/itextsharp-few-c-examples/

How to get a list of the fields in an XFA form?

I am trying to get a simple list of all the fields in my XFA form. I am using this code:
private void ListFieldNames()
{
string pdfTemplate = #"C:\Projects\iTextSharp\SReport.pdf";
MemoryStream m = new MemoryStream();
// title the form
this.Text += " - " + pdfTemplate;
// create a new PDF reader based on the PDF template document
PdfReader pdfReader = new PdfReader(pdfTemplate);
PdfStamper pdfStamper = new PdfStamper(pdfReader, m);
AcroFields formFields = pdfStamper.AcroFields;
AcroFields form = pdfReader.AcroFields;
XfaForm xfa = form.Xfa;
StringBuilder sb = new StringBuilder();
sb.Append(xfa.XfaPresent ? "XFA form" : "AcroForm");
sb.Append(Environment.NewLine);
foreach (string key in form.Fields.Keys)
{
sb.Append(key);
sb.Append(Environment.NewLine);
txtFields.Text = sb.ToString();
}
txtFields.Text = sb.ToString();
}
But all I am getting is the XFA Form and not any fields. Any idea what I am doing wrong?
Thanks in advance

You've taken a code sample from chapter 8 of my book "iText in Action." The result of that code sample is consistent with what I wrote on page 273:
Running Listing 8.18 with this form as resource will give you the following result:
AcroForm
If your question is Any idea what I am doing wrong? then the answer is simple: you stopped reading on page 270, or you used a code sample without reading the accompanying documentation. How to fix this? Read the documentation!
If your question is Why don't I get any info about the fields? (which isn't your question, but let's assume it is), the answer is: you're using code to retrieve AcroForm fields, but your form doesn't contain any such fields. Your form is a pure XFA form, which means that all field information is stored as XML and XML only!
Suppose that you now want to know: How can I extract that XML? then you should go to the place where you found the example you copy/pasted.
That could be here:
http://itextpdf.com/examples/iia.php?id=164
Or maybe here: http://sourceforge.net/p/itextsharp/code/HEAD/tree/trunk/book/iTextExamplesWeb/iTextExamplesWeb/iTextInAction2Ed/Chapter08/XfaMovie.cs
Or even here: http://kuujinbo.info/iTextInAction2Ed/index.aspx?ch=Chapter08&ex=XfaMovie
This code snippet will return the complete XFA stream:
public string ReadXfa(PdfReader reader) {
XfaForm xfa = new XfaForm(reader);
XmlDocument doc = xfa.DomDocument;
reader.Close();
if (!string.IsNullOrEmpty(doc.DocumentElement.NamespaceURI)) {
doc.DocumentElement.SetAttribute("xmlns", "");
XmlDocument new_doc = new XmlDocument();
new_doc.LoadXml(doc.OuterXml);
doc = new_doc;
}
var sb = new StringBuilder(4000);
var Xsettings = new XmlWriterSettings() {Indent = true};
using (var writer = XmlWriter.Create(sb, Xsettings)) {
doc.WriteTo(writer);
}
return sb.ToString();
}
Now look for the <xfa:datasets> tag; it will have a subtag <xfa:data> (probably empty if the form is empty) and a subtag <dd:dataDescription>. Inside the dataDescription tag, you'll find something that resembles XSD. That's what you need to know what the fields in the form are about.
I could go on guessing questions, such as: How do I fill out such a form? By using the method fillXfaForm(); How can I flatten such a form? By using XFA Worker (which is a closed source library written on top of iTextSharp), but let's keep those questions for another thread ;-)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to read PDF form data using iTextSharp? - c#

I am trying to find out if it is possible to read PDF Form data (Forms filled in and saved with the form) using iTextSharp. How can I do this?

Related

Itext 7 - PdfReader is not opened with owner password Error

AcroForm PDF to normal PDF in c#

Add new page to iTextSharp's PdfStamper with template and fields

Reading text against the editable field in PDF to UI

How to get a list of the fields in an XFA form?

Categories

Resources