Pdf form fields position retrieval with Itext

Pdf form fields position retrieval with Itext - c#

With iTextSharp, I can retrieve all the form fields that are present in a PDF form.I'm using Adobe acrobat reader to edit the PDF, where I see, every field have a position attribute which denotes where the PDF field will reside in a form.
So my question is, can I read that value ?
For example if I have a form field Name in a PDF form, can I get the position value of this field, like left 0.5 inches, right 2.5 inches, top 2 inches, bottom 2 inches ?
Right now I'm retrieving the form fields with the below code :
string pdfTemplate = #"D:\abc.pdf";
PdfReader reader = new PdfReader(pdfTemplate);
var fields = reader.AcroFields;
int ffRadio = 1 << 15; //Per spec, 16th bit is radio button
int ffPushbutton = 1 << 16; //17th bit is push button
int ff;
//Loop through each field
foreach (var f in fields.Fields)
{
String type = "";
String name = f.Key.ToString();
String value = fields.GetField(f.Key);
//Get the widgets for the field (note, this could return more than 1, this should be checked)
PdfDictionary w = f.Value.GetWidget(0);
//See if it is a button-like object (/Ft == /Btn)
if (!w.Contains(PdfName.FT) || !w.Get(PdfName.FT).Equals(PdfName.BTN))
{
type = "T";
}
else
{
//Get the optional field flags, if they don't exist then just use zero
ff = (w.Contains(PdfName.FF) ? w.GetAsNumber(PdfName.FF).IntValue : 0);
if ((ff & ffRadio) == ffRadio)
{
//Is Radio
type = "R";
}
else if (((ff & ffRadio) != ffRadio) && ((ff & ffPushbutton) != ffPushbutton))
{
//Is Checkbox
type = "C";
}
else
{
//Regular button
type = "B";
}
}
//MessageBox.Show(type + "=>" + name + "=>" + value);
FormFields fld = new FormFields(name, type, value, "inputfield" +form_fields.Count);
form_fields.Add(fld);
if (type.Equals("T"))
addContent(form_fields.Count);
}

I was about to close this question as a duplicate of Find field absolute position and dimension by acrokey but that's a Java answer, and although most developers have no problem converting the Java to C#, it may be helpful for some developers to get the C# answer.
Fields in a PDF are visualized using widget annotations. One field can correspond with different of those annotations. For instance, you could have a field named name that is visualized on every page. In this case, the value of this field would be shown on every page.
There's a GetFieldPositions() method that returns a list of multiple positions, one for every widget annotations.
This is some code I copied from the answer to the question iTextSharp GetFieldPositions to SetSimpleColumn
IList<AcroFields.FieldPosition> fieldPositions = fields.GetFieldPositions("fieldNameInThePDF");
if (fieldPositions == null || fieldPositions.Count <= 0) throw new ApplicationException("Error locating field");
AcroFields.FieldPosition fieldPosition = fieldPositions[0];
left = fieldPosition.position.Left;
right = fieldPosition.position.Right;
top = fieldPosition.position.Top;
bottom = fieldPosition.position.Bottom;
If one field corresponds with one widget annotation, then left, right, top, and bottom will give you the left, right, top and bottom coordinate of the field. The width of the field can be calculated like this: right - left; the height like this: top - bottom. These values are expressed in user units. By default there are 72 user units in one inch.
If your document contains more than one page, then fieldPosition.page will give you the page number where you'll find the field.
All of this is documented on http://developers.itextpdf.com/

Related

Get bounds of glyphs in PDF with GemBox

Goal: extract a value from a specific location inside a PDF page. In GemBox.Pdf, I can extract text elements including their bounds and content, but:
Problem: a text element can have a complex structure, with each glyph being positioned using individual settings.
Consider this common example of a page header:
Billing Info Date: 02/02/20222
Company Ltd. Order Number: 0123456789
123 Main Street Name: Smith, John
Let's say, I want to get the order number (0123456789) from the document, knowing its precise position on the page. But in practice, often enough the entire line would be one single text element, with the content SO CompanyOrder Number:0123456789, and all positioning and spacing done via offsets and indices only. I can get the bounds and text of the entire line, but I need the bounds (and value) of each character/glyph, so I can combine them into "words" (= character sequences, separated by whitespace or large offsets).
I know this is definitely possible in other libraries. But this question is specific to GemBox. It seems to me, all the necessary implementations should already there, just not much is exposed in the API.
In itextsharp I can get the bounds for each single glyph, like this:
// itextsharp 5.2.1.0
public GlyphExtractionStrategy : LocationTextExtractionStrategy
{
public override void RenderText(TextRenderInfo renderInfo)
{
var segment = renderInfo.GetBaseline();
var chunk = new TextChunk(
renderInfo.GetText(),
segment.GetStartPoint(),
segment.GetEndPoint(),
renderInfo.GetSingleSpaceWidth(),
renderInfo.GetAscentLine(),
renderInfo.GetDescentLine()
);
// glyph infos
var glyph = chunk.Text;
var left = chunk.StartLocation[0];
var top = chunk.StartLocation[1];
var right = chunk.EndLocation[0];
var bottom = chunk.EndLocation[1];
}
}
var reader = new PdfReader(bytes);
var strategy = new GlyphExtractionStrategy();
PdfTextExtractor.GetTextFromPage(reader, pageNumber: 1, strategy);
reader.Close();
Is this possible in GemBox? If so, that would be helpful, because we already have the code to combinine the glphs into "words".
Currently, I can somewhat work around this using regex, but this is not always possible and also way too technical for end users to configure.

Try using this latest NuGet package, we added PdfTextContent.GetGlyphOffsets method:
Install-Package GemBox.Pdf -Version 17.0.1128-hotfix
Here is how you can use it:
using (var document = PdfDocument.Load("input.pdf"))
{
var page = document.Pages[0];
var enumerator = page.Content.Elements.All(page.Transform).GetEnumerator();
while (enumerator.MoveNext())
{
if (enumerator.Current.ElementType != PdfContentElementType.Text)
continue;
var textElement = (PdfTextContent)enumerator.Current;
var text = textElement.ToString();
int index = text.IndexOf("Number:");
if (index < 0)
continue;
index += "Number:".Length;
for (int i = index; i < text.Length; i++)
{
if (text[i] == ' ')
index++;
else
break;
}
var bounds = textElement.Bounds;
enumerator.Transform.Transform(ref bounds);
string orderNumber = text.Substring(index);
double position = bounds.Left + textElement.GetGlyphOffsets().Skip(index - 1).First();
// TODO ...
}
}

Interop Word - Insert Mergefield to the End of the Range [duplicate]

I have created a 1x3 table as my header in word. This is how I want it to look like.
LeftText MiddleText PageNumber:
I want the PageNumber cell to look like this -
Page: X of Y
I have managed to do cell (1,1) and (1,2). I found this to help me with cell (1,3) but it is not working as I like. I know how to get the total count of the document. I'm not sure how to implement it properly.
Range rRange = restheaderTable.Cell(1, 3).Range;
rRange.End = rRange.End - 1;
oDoc.Fields.Add(rRange, Type: WdFieldType.wdFieldPage, Text: "Page Number: ");
I can't even get the Text "Page Number: " to display in the cell. All it has is a number right now.

The field enumeration you're looking for is WordWdFieldType.wdFieldNumPages.
The next hurdle is how to construct field + text + field as Word doesn't behave "logically" when things are added in this order. The target point remains before the field that's inserted. So it's either necessary to work backwards, or to move the target range after each bit of content.
Here's some code I have the demonstrates the latter approach. Inserting text and inserting fields are in two separate procedures that take the target Range and the text (whether literal or the field text) as parameters. This way the field code can be built up logically (Page x of n). The target Range is returned from both procedures, already collapsed to its end-point, ready for appending further content.
Note that I prefer to construct a field using the field's text (including any field switches) rather than specifying a field type (the WdFieldType enumeration). This provides greater flexibility. I also highly recommend setting the PreserveFormatting parameter to false as the true setting can result in very odd formatting when fields are updated. It should only be used in very specific instances (usually involving linked tables).
private void btnInsertPageNr_Click(object sender, EventArgs e)
{
getWordInstance();
Word.Document doc = null;
if (wdApp.Documents.Count > 0)
{
doc = wdApp.ActiveDocument;
Word.Range rngHeader = doc.Sections[1].Headers[Microsoft.Office.Interop.Word.WdHeaderFooterIndex.wdHeaderFooterPrimary].Range;
if (rngHeader.Tables.Count > 0)
{
Word.Table tbl = rngHeader.Tables[1];
Word.Range rngPageNr = tbl.Range.Cells[tbl.Range.Cells.Count].Range;
//Collapse the range so that it's within the cell and
//doesn't include the end-of-cell markers
object oCollapseStart = Word.WdCollapseDirection.wdCollapseStart;
rngPageNr.Collapse(ref oCollapseStart);
rngPageNr = InsertNewText(rngPageNr, "Page ");
rngPageNr = InsertAField(rngPageNr, "Page");
rngPageNr = InsertNewText(rngPageNr, " of ");
rngPageNr = InsertAField(rngPageNr, "NumPages");
}
}
}
private Word.Range InsertNewText(Word.Range rng, string newText)
{
object oCollapseEnd = Word.WdCollapseDirection.wdCollapseEnd;
rng.Text = newText;
rng.Collapse(ref oCollapseEnd);
return rng;
}
private Word.Range InsertAField(Word.Range rng,
string fieldText)
{
object oCollapseEnd = Word.WdCollapseDirection.wdCollapseEnd;
object unitCharacter = Word.WdUnits.wdCharacter;
object oOne = 1;
Word.Field fld = rng.Document.Fields.Add(rng, missing, fieldText, false);
Word.Range rngField = fld.Result;
rngField.Collapse(ref oCollapseEnd);
rngField.MoveStart(ref unitCharacter, ref oOne);
return rngField;
}

Creating a sudoku. Should I use a while statement for this code?

I'm making a sudoku in Windows Form Application.
I have 81 textboxes and I have named them all textBox1a, textBox1b... textBox2a, textBox2b...
I want to make it so that if any of the textboxes, in any of the rows, is equal to any other textbox in the same row, then both will get the background color red while the textboxes are equal.
I tried using this code just for test:
private void textBox1a_TextChanged_1(object sender, EventArgs e)
{
while (textBox1a.Text == textBox1b.Text)
{
textBox1a.BackColor = System.Drawing.Color.Red;
textBox1b.BackColor = System.Drawing.Color.Red;
}
It didn't work, and I don't know where I should put all this code, I know I shouldn't have it in the textboxes.
Should I use a code similar to this or is it totally wrong?

You want to iterate over the collection of text boxes just once, comparing it to those that haven't yet been compared against. If you have your textboxes in an array (let's call it textBoxes), and know which one was just changed (e.g. from the textChanged handler), you could do:
void highlightDuplicates(int i) // i is the index of the box that was changed
{
int iVal = textBoxes[i].Text;
for (int j = 0; j < 82; j++)
{
// don't compare to self
if (i == j) return;
if (textBoxes[j].Text == iVal)
{
textBoxes[i].BackgroundColor = System.Drawing.Color.Red;
textBoxes[j].BackgroundColor = System.Drawing.Color.Red;
}
}
}
If you wanted to get fancier, you could put your data in something like: Dictionary<int, TextBox>, where the key is the value and the TextBox is a reference to the text box with that value. Then you can quickly test for duplicate values with Dictionary.Contains() and color the matching text box by getting its value.

I think your current code would result in an infinite loop. The textboxes' values can't change while you are still in the event handler, so that loop would never exit.
If all of your boxes are named according to one convention, you could do something like this. More than one input can use the same handler, so you can just assign this handler to all the boxes.
The following code is not tested and may contain errors
private void textBox_TextChanged(object sender, EventArgs e){
var thisBox = sender as TextBox;
//given name like "textBox1a"
var boxNumber = thisBox.Name.SubString(7,1);
var boxLetter = thisBox.Name.SubString(8,1);
//numbers (horizontal?)
for(int i = 1; i<=9; i++){
if(i.ToString() == boxNumber)
continue; //don't compare to self
var otherBox = Page.FindControl("textBox" + i + boxLetter) as TextBox;
if (otherBox.Text == thisBox.Text)
{
thisBox.BackColor = System.Drawing.Color.Red;
otherBox.BackColor = System.Drawing.Color.Red;
}
}
//letters (vertical?)
for(int i = 1; i<=9; i++){
var j = ConvertNumberToLetter(i); //up to you how to do this
if(j == boxLetter)
continue; //don't compare to self
var otherBox = Page.FindControl("textBox" + boxNumber + j) as TextBox;
if (otherBox.Text == thisBox.Text)
{
thisBox.BackColor = System.Drawing.Color.Red;
otherBox.BackColor = System.Drawing.Color.Red;
}
}
}

I believe you will be more effective if create an Array (or a List) of Integers and compare them in memory, against compare them in UI (User Interface).
For instance, you could:
1) Create an Array of 81 integers.
2) Everytime the user input a new number, you search for it in that Array. If found, set the textbox as RED, otherwise, add the new value to that array.
3) The ENTER event may be allocated fot the entire Textboxes (utilize the Handles keyword with all Textboxes; like handles Text1.enter, Text2.enter, Text3.enter ... and so forth)
Something like:
int[] NumbersByUser = new int[81];
Private Sub Textbox1.Enter(sender as object, e as EventArgs) handles Textbox1.Enter, Textbox2.Enter, Textbox3.enter ...
int UserEntry = Convert.ToInt32(Sender.text);
int ValorSelecionado = Array.Find(NumbersByUser, Z => (Z == UserEntry));
if (ValorSelecionado > 0) {
Sender.forecolor = Red;
}
else
{
NumbersByUser(Index) = UserEntry;
}

You should have a 2 dimensional array of numbers (could be one dimensional, but 2 makes more sense) let's assume its called Values. I suggest that you have each textbox have a incrementing number (starting top left, going right, then next row). Now you can do the following:
All TextBox Changed events can point to the same function. The function then takes the tag to figure out the position in the 2dim array. (X coordinate is TAG % 9 and Y coordinate is TAG / 9)
In the callback you can loop over the textboxes and colorize all boxes as you like. First do the "check row" loop (pseudo code)
var currentTextBox = ((TextBox)sender)
var x = ((int)currentTextBox.Tag) % 9
var y = ((int)currentTextBox.Tag) / 9
// First assign the current value to the backing store
Values[currentTextBox] = int.parse(currentTextBox.Text)
// assuming variable x holding the column and y holding the row of current box
// Array to hold the status of a number (is it already used?)
bool isUsed[9] = {false, false, ...}
for(int col = 0; col <= 9; i++)
{
// do not compare with self
if(col == x) continue;
isUsed[textBox] = true;
}
// now we have the status of all other boxes
if( isUsed[Values[x,y]] ) currentTextBox.Background = Red else currentTextBox.Background = Green
// now repeat the procedure for the column iterating the rows and for the blocks

I would suggest a dynamic approach to this. Consider each board item as a cell (this would be it's own class). The class would contain a numeric value and other properties that could be useful (i.e. a list of possible values).
You would then create 3 collections of the cells, these would be:
A collection of rows of 9 cells (for tracking each row)
A collection of columns of 9 cells (for tracking each column)
A collection of 3x3 cells
These collections would share references - each cell object would appear once in each collection. Each cell could also contain a reference to each of the 3 collections.
Now, when a cell value is changed, you can get references to each of the 3 collections and then apply a standard set of Sudoku logic against any of those collections.
You then have some display logic that can walk the boards of cells and output to the display (your View) your values.
Enjoy - this is a fun project.

Get Series value from a mouse click

I use Microsoft.DataVisualization.Charting and want to get the value of the point when i click on it.
My problem: i want exactly that value i clicked, even if its only a value calculated by the Chart and between 2 points.
Example: 3 points: P(0;3), P(1;6), P(3;12)
When i click at x-Value 2 i want to get 9 as result if the line is linear.
Currently i do that:
HitTestResult[] hits = chart.HitTest(e.X, e.Y, false, ChartElementType.PlottingArea);
//DataInformation save the DateTime and Value for later use
DataInformation[] dinfo = new DataInformation[hits.Length];
foreach (ChartArea area in chart.ChartAreas)
{
area.CursorX.LineWidth = 0; //clear old lines
}
for (int i = 0; i < hits.Length; i++) //for all hits
{
if (hits[i].ChartElementType == ChartElementType.PlottingArea)
{
//val saves the x-value clicked in the ChartArea
double val = hits[i].ChartArea.AxisX.PixelPositionToValue(e.X);
DataPoint pt = chart.Series[hits[i].ChartArea.Name].Points.Last(elem => elem.XValue < val);
dinfo[i].caption = hits[i].ChartArea.Name;
dinfo[i].value = pt.YValues[0].ToString();
//hits[i].ChartArea.CursorX.Position = pt.XValue;
}
}
This show the right values for every existing data point but not that clicked point.
How can i get the exact value?

It seems, there is no way to get the exact value. I changed to OxyPlot. OxyPlot can show the data much faster and you can get the exact value for any point.

iTextSharp GetFieldPositions to SetSimpleColumn

I'm using the latest version of iTextSharp found here: http://sourceforge.net/projects/itextsharp/
I am trying to use ColumnText.SetSimpleColumn after getting the position of some AcroFields using GetFieldPositions( fieldName ).
All the examples I can find show GetFieldPositions returning a float[] however this doesn't appear to be the case anymore. It now appears to be returning IList which doesn't (according to Visual Studio) implicitly convert to a float[].
Inside the return value at the 0 index is a position member that is a Rectangle, but since the examples I've seen perform math operations on the returned float[] I'm not sure what values from the return value in GetFieldPostions to use when using SetSimpleColumn. Here's one article that I'm referencing: http://blog.dmbcllc.com/2009/07/08/itextsharp-html-to-pdf-positioning-text/
Simplest accepted answer will be how to translate the value from GetFieldPositions to SetSimpleColumn.
Thanks!

I think this was done for two reasons. 1), GetFieldPositions() could actually return multiple items because you can technically have more than one field with the same name and 2), the original array method required knowing "magic array numbers" to find what was what. All of the code that you saw pretty much assumed that GetFieldPositions() only returned a single item, which is true 99% of the time. Instead of working with indexes you can now work with normal properties.
So the code from the link that you posted:
float[] fieldPosition = null;
fieldPosition = fields.GetFieldPositions("fieldNameInThePDF");
left = fieldPosition[1];
right = fieldPosition[3];
top = fieldPosition[4];
bottom = fieldPosition[2];
if (rotation == 90)
{
left = fieldPosition[2];
right = fieldPosition[4];
top = pageSize.Right - fieldPosition[1];
bottom = pageSize.Right - fieldPosition[3];
}
Should be converted to:
IList<AcroFields.FieldPosition> fieldPositions = fields.GetFieldPositions("fieldNameInThePDF");
if (fieldPositions == null || fieldPositions.Count <= 0) throw new ApplicationException("Error locating field");
AcroFields.FieldPosition fieldPosition = fieldPositions[0];
left = fieldPosition.position.Left;
right = fieldPosition.position.Right;
top = fieldPosition.position.Top;
bottom = fieldPosition.position.Bottom;
if (rotation == 90)
{
left = fieldPosition.position.Bottom;
right = fieldPosition.position.Top;
top = pageSize.Right - fieldPosition.position.Left;
bottom = pageSize.Right - fieldPosition.position.Right;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Pdf form fields position retrieval with Itext - c#

Related

Get bounds of glyphs in PDF with GemBox

Interop Word - Insert Mergefield to the End of the Range [duplicate]

Creating a sudoku. Should I use a while statement for this code?

Get Series value from a mouse click

iTextSharp GetFieldPositions to SetSimpleColumn

Categories

Resources