I'm trying to change the text in some PDF annotations using iTextSharp. Here is my code:
void changeAnnotations(string inputPath, string outputPath)
{
PdfReader pdfReader = new PdfReader(inputPath);
PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(outputPath, FileMode.Create));
//get the PdfDictionary of the 1st page
PdfDictionary pageDict = pdfReader.GetPageN(1);
//get annotation array
PdfArray annotArray = pageDict.GetAsArray(PdfName.ANNOTS);
//iterate through annotation array
int size = annotArray.Size;
for (int i = 0; i < size; i++)
{
//get value of /Contents
PdfDictionary dict = annotArray.GetAsDict(i);
PdfString contents = dict.GetAsString(PdfName.CONTENTS);
//check if /Contents key exists
if (contents != null)
{
//set new value
dict.Put(PdfName.CONTENTS, new PdfString("value has been changed"));
}
}
pdfStamper.Close();
}
When I open the output file in Adobe Reader, none of the text has changed in any of the annotations. How should I be setting the new value in an annotation?
UPDATE: I've found that the value is being changed in the popup box that appears when I click on the annotation. And in some cases, when I modify this value in the popup box, the change is then applied to the annotation.
As the OP clarified in a comment:
This annotation is a FreeText, how do I find and change the text that's displayed in this text box?
Free text annotations allow a number of mechanisms to set the displayed text:
A pre-formatted appearance stream, referenced by the N entry in the AP dictionary
A rich text string with a default style string given in RC and DS respectively
A default appearance string applied to the contents given in DA and Contents respectively
(For details cf. the PDF specification ISO 32000-1 section 12.5.6.6 Free Text Annotations)
If you want to change the text using one of these mechanisms, make sure you remove or adjust the contents of the entries for the other mechanisms; otherwise your change might not be visible or even visible on some viewers but not visible on others.
I can't figure out how to determine if there is an appearance stream. Is that the /AP property? I checked that for one of the annotations and it's a dictionary with a single entry whose value is 28 0 R.
So that one of the annotations indeed comes with an appearance stream. The single entry whose value is 28 0 R presumably has the N name to indicate the normal appearance. 28 0 R is a reference to the indirect object with object number 28 and generation 0.
If you want to change the text content but do not want to deal with the formatting details, you should remove the AP entry.
Related
I try to write a pdf file with a header, logo and table using iText7 in c#.
I never used iText7 before and therefore I don't know how to write text in a paragraph to a fixed position.
Right now I am just using tabstops as anchors for my text. But the problem here is, when the string is too long everything following in the line will be shifted by a tabstop and the "columns" in the header aren't aligned anymore.
The following picture is what I want too achieve:
This picture shows what happens if a string gets too long (in this example I used a long username):
Here is a code snippet I use to write one line of the header:
// generate 8 tabstops to split pdf in equal sections
List<TabStop> tabStops = new List<TabStop>();
for (uint i = 0; i < 8; i++)
{
float tabSize = pageSize.GetWidth() / 8;
tabStops.Add(new TabStop(tabSize, TabAlignment.LEFT));
}
Paragraph p = new Paragraph();
p.SetFontSize(10);
// add tabstops to paragraph for text alignment
p.AddTabStops(tabStops);
// add title of header
p.Add(title1).Add("\n");
// write line one of header
p.Add("Serie: ").Add(new Tab()).Add(info.serial.ToString())
.Add(new Tab()).Add(new Tab())
.Add("Input CSV: ").Add(new Tab()).Add(info.inputFileName)
.Add(new Tab()).Add(new Tab()).Add("Out-Series: ")
.Add(info.serial.ToString()).Add("\n");
// line 2...
p.Add("User: ").Add(new Tab()).Add(info.username)
.Add(new Tab()).Add(new Tab()).Add(new Tab())
.Add("qPCR-Datei: ").Add(new Tab()).Add(info.qpcr1FileName)
.Add(new Tab()).Add(new Tab()).Add(new Tab())
.Add("STR-Out: ").Add(strFileName).Add("\n");
I hope someone can help me show me a better way of text alignment or has information where to look at.
Another nice tip would be how I can keep linebreaks in the same tab stop section. for example if a file name gets too long (s. "STR-Out: " in picture) the linebreak will be executed but the part of the filename in the new line should stay at the tab stop behind "STR-OUT: "
Instead of Tab/Tabspace use Tables and Cells so that alignment will be proper.
Create table of column 8 size (Label, Value, space , Label, Value, Space, Label, Value)
Use this sample Code.
PdfPTable table = new PdfPTable(8);
PdfPCell cell;
cell = new PdfPCell();
cell.setRowspan(2); //only if spanning needed
table.addCell(cell);
for(int aw=0;aw<8;aw++){
table.addCell("hi");
}
Thanks #shihabudheenk for pointing me in the right direction with the idea of using a table.
Just had to adjust some code to iText7.
First thing is that
Table headerTable = new Table().SetBorder(Border.NO_BORDER);
has no effect in iText7, you have to set the option for each cell individually like:
Cell cell = new Cell().SetBorder(Border.NO_BORDER);
but here is the problem that
cell.Add()
in iText7 only accepts IBlockElement as parameter so i have too use it like this:
cell.Add(new Paragraph("text");
which is pretty annoying doing that for every cell over and over again. Therefore i used a removeBorder function as suggested here
So the final code I use to build the header looks like this:
// initialize table with fixed column sizes
Table headerTable = new Table(UnitValue.CreatePercentArray(
new[] { 1f, 1.2f, 1f, 1.8f, 0.7f, 2.5f })).SetFixedLayout();
// write headerline 1
headerTable.AddCell("Serie: ").AddCell(info.serial.ToString())
.AddCell("Input CSV: ")
.AddCell(info.inputFileName)
// write remaining lines...
....
// remove boarder from all cells
removeBorder(headerTable);
private static void removeBorder(Table table)
{
foreach (IElement iElement in table.GetChildren())
{
((Cell)iElement).SetBorder(Border.NO_BORDER);
}
}
With iTextSharp, I can retrieve all the form fields that are present in a PDF form.I'm using Adobe acrobat reader to edit the PDF, where I see, every field have a position attribute which denotes where the PDF field will reside in a form.
So my question is, can I read that value ?
For example if I have a form field Name in a PDF form, can I get the position value of this field, like left 0.5 inches, right 2.5 inches, top 2 inches, bottom 2 inches ?
Right now I'm retrieving the form fields with the below code :
string pdfTemplate = #"D:\abc.pdf";
PdfReader reader = new PdfReader(pdfTemplate);
var fields = reader.AcroFields;
int ffRadio = 1 << 15; //Per spec, 16th bit is radio button
int ffPushbutton = 1 << 16; //17th bit is push button
int ff;
//Loop through each field
foreach (var f in fields.Fields)
{
String type = "";
String name = f.Key.ToString();
String value = fields.GetField(f.Key);
//Get the widgets for the field (note, this could return more than 1, this should be checked)
PdfDictionary w = f.Value.GetWidget(0);
//See if it is a button-like object (/Ft == /Btn)
if (!w.Contains(PdfName.FT) || !w.Get(PdfName.FT).Equals(PdfName.BTN))
{
type = "T";
}
else
{
//Get the optional field flags, if they don't exist then just use zero
ff = (w.Contains(PdfName.FF) ? w.GetAsNumber(PdfName.FF).IntValue : 0);
if ((ff & ffRadio) == ffRadio)
{
//Is Radio
type = "R";
}
else if (((ff & ffRadio) != ffRadio) && ((ff & ffPushbutton) != ffPushbutton))
{
//Is Checkbox
type = "C";
}
else
{
//Regular button
type = "B";
}
}
//MessageBox.Show(type + "=>" + name + "=>" + value);
FormFields fld = new FormFields(name, type, value, "inputfield" +form_fields.Count);
form_fields.Add(fld);
if (type.Equals("T"))
addContent(form_fields.Count);
}
I was about to close this question as a duplicate of Find field absolute position and dimension by acrokey but that's a Java answer, and although most developers have no problem converting the Java to C#, it may be helpful for some developers to get the C# answer.
Fields in a PDF are visualized using widget annotations. One field can correspond with different of those annotations. For instance, you could have a field named name that is visualized on every page. In this case, the value of this field would be shown on every page.
There's a GetFieldPositions() method that returns a list of multiple positions, one for every widget annotations.
This is some code I copied from the answer to the question iTextSharp GetFieldPositions to SetSimpleColumn
IList<AcroFields.FieldPosition> fieldPositions = fields.GetFieldPositions("fieldNameInThePDF");
if (fieldPositions == null || fieldPositions.Count <= 0) throw new ApplicationException("Error locating field");
AcroFields.FieldPosition fieldPosition = fieldPositions[0];
left = fieldPosition.position.Left;
right = fieldPosition.position.Right;
top = fieldPosition.position.Top;
bottom = fieldPosition.position.Bottom;
If one field corresponds with one widget annotation, then left, right, top, and bottom will give you the left, right, top and bottom coordinate of the field. The width of the field can be calculated like this: right - left; the height like this: top - bottom. These values are expressed in user units. By default there are 72 user units in one inch.
If your document contains more than one page, then fieldPosition.page will give you the page number where you'll find the field.
All of this is documented on http://developers.itextpdf.com/
I'm using LocationTextExtractionStrategy combined with a custom ITextExtractionStrategy class to read a PDF. With this code I can read documents portions based on coords without problems.
Now I get a PDF that seems like the others but if I try to read it I get text like this:
2 D 80 D 8 1 M 13M2 R V / 8 3B 3 3 710 022/F//0 R8 8 1 0 / 3
This is the code I'm using:
private static string ReadFilePart(string fileName,int pageNumber, int fromLeft, int fromBottom, int width, int height)
{
var rect = new System.util.RectangleJ(fromLeft, fromBottom, width, height);
var pdfReader = new PdfReader(fileName);
var filters = new RenderFilter[1];
filters[0] = new RegionTextRenderFilter(rect);
var strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filters);
var pageText = PdfTextExtractor.GetTextFromPage(pdfReader, pageNumber, new LimitedTextStrategy(strategy));
pdfReader.Close();
return pageText;
}
private class LimitedTextStrategy : ITextExtractionStrategy
{
public readonly ITextExtractionStrategy textextractionstrategy;
public LimitedTextStrategy(ITextExtractionStrategy strategy)
{
textextractionstrategy = strategy;
}
public void RenderText(TextRenderInfo renderInfo)
{
foreach (TextRenderInfo info in renderInfo.GetCharacterRenderInfos())
{
textextractionstrategy.RenderText(info);
}
}
public string GetResultantText()
{
return textextractionstrategy.GetResultantText();
}
public void BeginTextBlock()
{
textextractionstrategy.BeginTextBlock();
}
public void EndTextBlock()
{
textextractionstrategy.EndTextBlock();
}
public void RenderImage(ImageRenderInfo renderInfo)
{
textextractionstrategy.RenderImage(renderInfo);
}
}
I cannot share the PDF file due to sensitive data.
Update
If I change LocationTextExtractionStrategy with SimpleTextExtractionStrategy it recognize the full row without strange characters (PDF structure?).
Update 2
I can now share the file! Problematic pages are 2° and 3°
PDF file
Test solution to read the file
Update 3
mkl pointed me in the right direction and I fixed adding FistChar, LastChar and Widths to all fonts with missing properties with default values.
private static PdfReader FontFix(PdfReader pdfReader)
{
for (var p = 1; p <= pdfReader.NumberOfPages; p++)
{
var dic = pdfReader.GetPageN(p);
var resources = dic.GetAsDict(PdfName.RESOURCES);
var fonts = resources?.GetAsDict(PdfName.FONT);
if (fonts == null) continue;
foreach (var key in fonts.Keys)
{
var font = fonts.GetAsDict(key);
var firstChar = font.Get(PdfName.FIRSTCHAR);
if(firstChar==null)
font.Put(PdfName.FIRSTCHAR, new PdfNumber(0));
var lastChar = font.Get(PdfName.LASTCHAR);
if (lastChar == null)
font.Put(PdfName.LASTCHAR, new PdfNumber(255));
var widths= font.GetAsArray(PdfName.WIDTHS);
if (widths == null)
{
var array=new int[256];
array=Enumerable.Repeat(600, 256).ToArray();
font.Put(PdfName.WIDTHS, new PdfArray(array));
}
}
}
return pdfReader;
}
The error in the PDF
The cause of this issue is that the PDF contains one incomplete font dictionary. Most font dictionaries in the PDF are complete but there is one exception, the dictionary in object 28 used for the font Fo0 in the shared resources which is used to "fill in" the fields on pages two and three:
<<
/Name /Fo0
/Subtype /TrueType
/BaseFont /CourierNew
/Type /Font
/Encoding /WinAnsiEncoding
>>
In particular this font dictionary does not contain the required Widths entry whose value would be an array of the widths of the font glyphs.
Thus, iTextSharp has no idea how wide the glyphs actually are and uses 0 as default value.
As an aside, such incomplete font dictionaries are allowed (albeit deprecated) for a very limited set of Type 1 fonts, the so called standard 14 fonts. The TrueType font "CourierNew" obviously is not among them. But the developer who created the software responsible for the incomplete structure above, probably did not care to look into the PDF specification and simply followed the example of those special Type 1 fonts.
The effect on your code
In your LimitedTextStrategy.RenderText implementation
public void RenderText(TextRenderInfo renderInfo)
{
foreach (TextRenderInfo info in renderInfo.GetCharacterRenderInfos())
{
textextractionstrategy.RenderText(info);
}
}
you split the renderInfo (describing a longer string) into multiple TextRenderInfo instances (describing one glyph each). If the font of renderInfo is the critical Fo0, all those TextRenderInfo instances have the same position because iTextSharp assumed the glyph widths to be 0.
...using the LocationTextExtractionStrategy
Those TextRenderInfo instances then are filtered and forwarded to the LocationTextExtractionStrategy which later on sorts them by position. As the positions coincide and the sorting algorithm used does not keep elements with the same position in their original order, this sorting effectively shuffles them. Eventually you get all the corresponding characters in a chaotic order.
...using the SimpleTextExtractionStrategy
In this case those TextRenderInfo instances then are filtered and forwarded to the SimpleTextExtractionStrategy which does not sort them but instead adds the respectively corresponding characters to the result string. If in the content stream the text showing operations occur in reading order, the result returned by the strategies is in proper reading order, too.
Why does Adobe Reader display the text in proper order?
If confronted with a broken PDF, different programs can attempt different strategies to cope with the situation.
Adobe Reader in the case at hand most likely searches a CourierNew TrueType font program in the operation system and uses the width information from there. This most likely is what the creator of that broken font structure hoped for.
I have a pdf document, inside are comments lists of 2 types :
1. Rectangle
2. Text Box
I want to get values from Text Boxes with c# and itextsharp.
The text boxes and rectangles you're referring to are called Annotations. Annotations are defined as dictionaries and they are listed per page.
In other words: you need to create a PdfReader instance and get the ANNOTS from each page:
PdfReader reader = new PdfReader("your.pdf");
for (int i = 1; i <= reader.NumberOfPages; i++) {
PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS);
if (array == null) continue;
for (int j = 0; j < array.Size; j++) {
PdfDictionary annot = array.GetAsDict(j);
PdfString text = annot.GetAsString(PdfName.CONTENTS);
...
}
}
In the above code sample, I have a PdfDictionary named annot, from which I can extract the Contents. You may be interested in some other entries too (for instance the name of the annotation, if any). Please inspect all the keys that are available in the annot object in case the Contents entry isn't what you're looking for.
Replace the dots with whatever you want to do with the text. PdfString has different method that will reveal its contents.
DISCLAIMER: I'm the original developer of iText (I always assume that people already know this, but I was once downvoted because I didn't add this disclaimer).
What I am doing is to generate a pdf booklet from database. I need go generate a content table with page numbers. E.g there are two chapters with page number like:
=============================
Content table
Chapter 1 ----- 3
Chapter 2 ----- 17
=============================
The text "Chapter 1 ----- " is normal paragraph. But the page number "3" has to be produced using PdfTemplate because it can only be known later. But the pdfTemplate is absolutely positioned. How can I know where to position the PdfTemplate? Am I right on this ? How could I figure this out or should I use other methods?
I've extracted a bit of code to get you on your way.. This code allows you to place text anywhere on a page using an x and y. You may actually want to use iTextSharp's built in paragraph and margin support, but this will be useful, just needs converting to C#
Dim stamper As PdfStamper
Dim templateReader As PdfReader = New PdfReader(yourFileName)
Dim currentPage As PdfImportedPage = stamper.GetImportedPage(templateReader, 1)
stamper.InsertPage(1, PageSize.A4)
Dim cb As PdfContentByte = stamper.GetOverContent(1)
cb.AddTemplate(currentPage, 0, 0)
Look this next bit with each element you want to add..
cb.BeginText()
cb.SetFontAndSize(bf, 12)
cb.SetColorFill(color) 'create a color object to represent the colour you want
cb.ShowTextAligned(1, "Content Table", x, y, 0) 'pass in the x & y of the element
cb.EndText()