When I use the pdfCanvas object I have the MoveText method where I can set the x and y coordinate but I don't see that in the Paragraph object? Second thing is why do I need the rectangle object I am not adding any Rectangle to the pdf just Text. Where I want the text to take the full width of the page. Can I get the size of the font and text and then calculate the centeredWidth and centeredHeight if I only have MoveText method in pdfCanvas?
for (int i = 1; i <= numberOfPages; i++)
{
PdfPage pdfPage = pdfDocument.GetPage(i);
iText.Kernel.Geom.Rectangle pageSizeWithRotation = pdfPage.GetPageSizeWithRotation();
float n2 = 15F;
float n3 = pageSizeWithRotation.GetHeight() - 10F;
float frontSize = 6.25f;
PdfCanvas pdfCanvas = new PdfCanvas(pdfPage);
iText.Kernel.Geom.Rectangle rectangle = new iText.Kernel.Geom.Rectangle(100, 100, 100, 100);
Canvas canvas = new Canvas(pdfCanvas, rectangle);
PdfFont font = PdfFontFactory.CreateFont("C:\\Windows\\Fonts\\ARIALN.TTF");
Paragraph p = new Paragraph()
.Add(disclaimerText)
.SetFont(font)
.SetFontSize(frontSize)
.SetTextAlignment(TextAlignment.CENTER);
canvas.Add(p);
canvas.Close();
//pdfCanvas.BeginText()
// .SetFillColorRgb(0, 0, 0)
// .SetFontAndSize(PdfFontFactory.CreateFont("C:\\Windows\\Fonts\\ARIALN.TTF"), frontSize)
// .MoveText(n2, n3)
// .ShowText(disclaimerText)
// .EndText();
}
First of all, please be aware that you are using different parts of the iText API with PdfCanvas on one side and Canvas on the other side:
PdfCanvas is merely a thin wrapper for the instructions written into the PDF. When using this class, you have to determine yourself where you want to start text lines, where to break the lines, how much space to add between characters, words, and lines, and so on.
Canvas (and Document) on the other hand features its own layout engine, you only initialize it with the PdfCanvas to work on and the coordinate ranges in which you want it to operate, and then you feed it paragraphs, tables, etc. which Canvas arranges properly.
Thus, you essentially have the choice, do you foremost want to arrange everything yourself, or do you foremost want to leave that task to iText.
That being said, let's look at your questions:
When I use the pdfCanvas object I have the MoveText method where I can set the x and y coordinate but I don't see that in the Paragraph object?
Paragraphs mostly are designed for automatic layout by Canvas and Document. Nonetheless, you can statically arrange them at given coordinates using the SetFixedPosition overloads.
Second thing is why do I need the rectangle object I am not adding any Rectangle to the pdf just Text.
You need the rectangle to tell Canvas where on the (theoretically endless) PdfCanvas coordinate plane it shall arrange the objects you give it.
Where I want the text to take the full width of the page.
Then use the full crop box of that page, assuming you mean the full visible width of the page on screen or the final printed product. You can get it using the PdfPage method GetCropBox.
If you are not sure which "full widths of the page" there are, have a look at this answer.
Can I get the size of the font and text and then calculate the centeredWidth and centeredHeight if I only have MoveText method in pdfCanvas?
You don't get the size of the font, you set it (using SetFontAndSize as you show in your code). You also set the font. And the PdfFont object you set has nice GetWidth overloads to determine the widths of some text drawn using that font. That in combination with the crop box mentioned above and trivial math allows you to calculate everything you need for simple text drawing.
Related
I'm trying to determine the size of the textframe that will be needed for a block of text. This is to then be exported for an InDesign script to create the page. All in a console application.
I've tried to create a WPF TextBlock and assign the Text and a Width, but the Height and ActualHeight is NaN.
How can I determine the size of a textframe that will be needed for some text? Is using a WPF / Winforms textblock the best solution (to try and take advantage of existing code), or is there some other, better workflow?
There are two classes in C# that are used to draw text. TextRenderer and Graphics.
TextRenderer uses GDI to render the text, whereas Graphics uses GDI+. The two use a slightly different method for laying out text.
You can make use of Graphics.MeasureString or TextRenderer.MeasureText
Example
using( Graphics g = Graphics.FromHwnd(IntPtr.Zero) )
{
SizeF size = g.MeasureString("some text", SystemFonts.DefaultFont);
}
For your case I would suggest using TextRenderer. Text wrapping example -
var size = TextRenderer.MeasureText(text, font, new Size(width, height), TextFormatFlags.WordBreak);
The third argument is size of the drawing rectangle. You can pass height as 0 if you don't know it.
I am using PDFBOX and itextsharp dll and processing a pdf.
so that I get the text coordinates of the text within a rectangle. the rectangle coordinates are extracted using the itextsharp.dll.
Basically I get the rectangle coordinates from itextsharp.dll, where itextsharp uses the coordinates system as lower left. And I get the pdf page text from PDFBOX, where PDFBOX uses the coordinates system as top upper left.
I need help in converting the Coordinates from lower left to upper left
Updating my question
Pardon me if you didn't understood my question and if not full information was provided.
well, Let me try to give more details from start.
I am working on a tool where I get a PDF in which a rectangle is drawn using some Drawing markups within a comment section. Now I am reading the rectangle coordinates using iTextsharp
PdfDictionary pageDict = pdReader.GetPageN(page_no);
PdfArray annotArray = pageDict.GetAsArray(PdfName.ANNOTS);
where pdReader is PdfReader.
And the page text along with its coordinates is extracted using PDFBOX. where as I have a class created pdfBoxTextExtraction in this I process the text and coordinate such that it returns the text and llx,lly,urx,ury "line by line" please note line by line not sentence wise.
So I want to extract the text that lays within the Rectangle coordinates. I got stuck when the coordinates of the rectangle returned from itextsharp i.e llx,lly,urx,ury of a rectangle has an origin at lower left where as the text coordinates returned from PDFBOX has an origin at upper left .then I realised I need to adjust the y-axis so that the origin moves from lower left to upper left. for the I got the height of the page and height of the cropbox
iTextSharp.text.Rectangle mediabox = reader.GetPageSize(page_no);
iTextSharp.text.Rectangle cropbox = reader.GetCropBox(page_no);
Did some basic adjustment
lly=mediabox.Top - lly
ury=mediabox.Top - ury
in some case the adjustment worked, whereas in some PDFs needed to do adjustment on cropbox
lly=cropbox .Top - lly
ury=cropbox .Top - ury
where as on some PDFs didn't worked.
All I need is help in adjusting the rectangle coordinates so that I get the text within the rectangle.
The coordinate system in PDF is defined in ISO-32000-1. This ISO standard explains that the X-axis is oriented towards the right, whereas the Y-axis has an upward orientation. This is the default. These are the coordinates that are returned by iText (behind the scenes, iText resolves all CTM transformations).
If you want to transform the coordinates returned by iText so that you get coordinates in a coordinate system where the Y axis has a downward orientation, you could for instance subtract the Y value returned by iText from the Y-coordinate of the top of the page.
An example: Suppose that we are dealing with an A4 page, where the Y coordinate of the bottom is 0 and the Y coordinate of the top is 842. If you have Y coordinates such as y1 = 806 and y2 = 36, then you can do this:
y = 842 - y;
Now y1 = 36 and y2 = 806. You have just reversed the orientation of the Y-axis using nothing more than simple high-school math.
Update based on an extra comment:
Each page has a media box. This defines the most important page boundaries. Other page boundaries may be present, but none of them shall exceed the media box (if they do, then your PDF is in violation with ISO-32000-1).
The crop box defines the visible area of the page. By default (for instance if a crop box entry is missing), the crop box coincides with the media box.
In your comment, you say that you subtract llx from the height. This is incorrect. llx is the lower-left x coordinate, whereas the height is a property measured on the Y axis, unless the page is rotated. Did you check if the page dictionary has a /Rotate value?
You also claim that the values returned by iText do not match the values returned by PdfBox. Note that the values returned by iText conform with the coordinate system as defined by the ISO standard. If PdfBox doesn't follow this standard, you should ask the people from PdfBox why they didn't follow the standard, and what coordinate system they are using instead.
Maybe that's what mkl's comment is about. He wrote:
Y' = Ymax - Y. X' = X - Xmin.
Maybe PdfBox searches for the maximum Y value Ymax and the minimum X value Xmin and then applies the above transformation on all coordinates. This is a useful transformation if you want to render a PDF, but it's unwise to perform such an operation if you want to use the coordinates, for instance to add content at specific positions relative to text on the page (because the transformed coordinates are no longer "PDF" coordinates).
Remark:
You say you need PdfBox to get the text of a page. Why do you need this extra tool? iText is perfectly capable of extracting and reordering the text on a page (assuming that you use the correct extraction strategy). If not, please clarify.
Note that we recently decided to support Type3 fonts, although we weren't convinced that this makes sense (see Text extraction is empty and unknown for text has type3 font using PDFBox,iText (difficult topic!) to understand why not).
What some consider "wrong extraction" can often be "wrong interpretation" of what is extracted as explained in this mailing-list answer: http://thread.gmane.org/gmane.comp.java.lib.itext.general/66829/focus=66830
There are other cases where we follow the spec, leading to results that are different than what PdfBox returns. Watch https://www.youtube.com/watch?v=wxGEEv7ibHE for more info.
if ((mediabox.Top - mediabox.Height) != 0)
{
topY = mediabox.Top;
heightY = mediabox.Height;
diffY = topY - heightY;
lly_adjust = (topY - ury) + diffY;
ury_adjust = (topY - lly) + diffY;
}
else if ((cropbox.Top - cropbox.Height) != 0)
{
topY = mediabox.Top;
heightY = cropbox.Top;
diffY = topY - heightY;
lly_adjust = (topY - ury) - diffY;
ury_adjust = (topY - lly) - diffY;
}
else
{
lly_adjust = mediabox.Top - ury;
ury_adjust = mediabox.Top - lly;
}
These are final adjustment done
Hi I am using PDF sharp to print user input onto positions in a template document.
The data (fields) are collected from user (web page) and written at appropriate positions on the document using drawstring method.
Currently I am finding the Pixel position of each templated field in each page by trial and error .It would be much easier if there is a way to determine pixel position of each field in pdf page.
Any suggestion would be most helpful.
thanks
I had the same issue as you, josephj1989, and I found what I believe to be our answer.
According to this page in the PDFSharp documentation,
The current implementation of PDFsharp has only one layout of the graphics context. The origin (0, 0) is top left and coordinates grow right and down. The unit of measure is always point (1/72 inch).
So, say I want to draw an image 3 inches from the left and 7.5 inches from the top of an 8.5 x 11 page, then I would do something like this:
PdfDocument document = GetBlankPdfDocument();
PdfPage myPage = document.AddPage();
myPage.Orientation = PdfSharp.PageOrientation.Portrait;
myPage.Width = XUnit.FromInch(8.5);
myPage.Height = XUnit.FromInch(11);
XImage myImage = GetSampleImage();
double myX = 3 * 72;
double myY = 7.5 * 72;
double someWidth = 126;
double someHeight = 36;
XGraphics gfx = XGraphics.FromPdfPage(myPage);
gfx.DrawImage(myBarcode, myX, myY, someWidth, someHeight);
I tested this out myself with a barcode image, and I found that when you use their formula for measuring positioning, you can get a level of precision at, well, 1/72 of an inch. That's not bad.
So, at this point, it'd be good to create some kind of encapsulation for such measurements so that we're focused mostly on the task at hand and not the details of conversion.
public static double GetCoordinateFromInch(double inches)
{
return inches * 72;
}
We could go on to make other helpers in this way...
public static double GetCoordinateFromCentimeter(double centimeters)
{
return centimeters * 0.39370 * 72;
}
With such helper methods, we could do this with the previous sample code:
double myX = GetCoordinateFromInch(3);
double myY = GetCoordinateFromInch(7.5);
double someWidth = 126;
double someHeight = 36;
XGraphics gfx = XGraphics.FromPdfPage(myPage);
gfx.DrawImage(myBarcode, myX, myY, someWidth, someHeight);
I hope this is helpful. I'm sure you will write cleaner code than what is in my example. Additionally, there are probably much smarter ways to streamline this process, but I just wanted to put something here that would make immediate use of what we saw in the documentation.
Happy PDF Rendering!
When I used PDF Sharp, my approach was to make use the XUnit struct and to reference the top left point of the document as my starting point for X/Y positions.
Obviously referencing the top left point of the document (0,0) for every element on a PdfPage will get messy. To combat this, I used the XRect class to create rectangles for elements to sit within. Once the XRect is drawn onto the page, you are then able to reference the X/Y position of the rectange via the XRect's properties. Then with some basic maths using those coordinates and the width/height of the XRect, you should be able to calculate the coordinates for the position of the next element you want to add to the PdfPage.
Follow this code sample, I've provided a rough sketch of what the end result would be. The code is untested but is very heavily based on code in production right now.
// Create a new PDF document
PdfDocument document = new PdfDocument();
// Create an empty page with the default size/orientation
PdfPage page = document.AddPage();
page.Orientation = PageOrientation.Landscape;
page.Width = XUnit.FromMillimeter(300);
page.Height = XUnit.FromMillimeter(200);
// Get an XGraphics object for drawing
XGraphics gfx = XGraphics.FromPdfPage(page);
// Add the first rectangle
XUnit horizontalOffset = XUnit.FromMillimeter(5);
XUnit verticalOffset = XUnit.FromMillimeter(5);
XUnit columnWidth = XUnit.FromMillimeter(100);
XUnit columnHeight = page.Height - (2 * verticalOffset);
XRect columnRect = new XRect(horizontalOffset, verticalOffset, columnWidth, columnHeight);
gfx.DrawRectangle(XBrushes.Teal, columnRect);
// Insert an image inside the rectangle, referencing the Left and Top properties of the rectangle for image placement
XImage topLogo = XImage.FromFile(GetFilePath(#"content\img\pdfs\standard\logo-no-strapline.jpg")); // GetFilePath is a private method, not shown for brevity
gfx.DrawImage(topLogo,
columnRect.Left + XUnit.FromMillimeter(5),
columnRect.Top + XUnit.FromMillimeter(5),
columnRect.Width - XUnit.FromMillimeter(10),
XUnit.FromMillimeter(38));
And the output:
Lastly, I'm sure you're aware, but there's a good resource of PdfSharp samples here.
PDF files have no pixels and no pixel positions.
You can print PDF pages (use the "Actual size" size option) and measure positions with a ruler (this works pretty good with our printers).
Or you can use Adobe Acrobat to measure the positions of items on the PDF pages.
A bit of trial and error remains as you may have to give or take half a millimeter.
Depending on your ruler you can use XUnit.FromMillimeter or XUnit.FromInch to get the point positions for PDFsharp (but points are no pixels).
The current PDFsharp samples are here:
http://www.pdfsharp.net/wiki/PDFsharpSamples.ashx
I am trying to do some precise alignment with iTextSharp, but I keep falling short as I can't figure out a way to get a width / height value for a chunk or paragraph. If I create a paragraph that is a certain font and size and text, then its dimensions should be known, right?
I know that the default left/right/center alignments will do the trick for me mostly, but I have scenarios where knowing the dimensions will be most useful. Any ideas?
You can get a chunk's width using GetWidthPoint() and the height of a chunk is generally the font's size unless you're using only lowercase letters. If so then you can manually measure characters using BaseFont.GetCharBBox().
Paragraphs are flowable items, however, and they depend on the context that they are written into so measuring them is harder. (Chunks don't automatically wrap but Paragraphs do.) The best way to measure a paragraph is to write it to a PdfCell and then measure the PdfCell. You don't have to actually add the PdfCell to the document. The link below explains it a little more.
http://itext-general.2136553.n4.nabble.com/Linecount-td2146114.html
Use below code for exact size
Font font = ...;
BaseFont baseFont = font.BaseFont;
float width = baseFont.GetWidthPoint(text, fontSize);
float height = baseFont.GetAscentPoint(text, fontSize) - baseFont.GetDescentPoint(text, fontSize);
Is there any technique to calculate the height of text in AcroField?
In other words, I have a template PDF with a body section that I want to paste my long text into and I want to get the height of the text. If it overflows, insert a new page for rest of the text.
This will give height and width:
Vector curBaseline = renderInfo.GetBaseline().GetStartPoint();
Vector topRight = renderInfo.GetAscentLine().GetEndPoint();
iTextSharp.text.Rectangle rect = new iTextSharp.text.Rectangle(
curBaseline[Vector.I1], curBaseline[Vector.I2],
topRight[Vector.I1], topRight[Vector.I2]
);
Single curFontSize = rect.Height;
Just set your field to a font size of zero. This will automatically size the font so that the given text will fit into your field... within certain limits. I don't think it'll shrink below 6 points.
Another alternative would be to use a ColumnText and call myColText.go(true). This will "simulate" layout, letting you know what goes where without actually drawing anything to the PDF. Just whip up a columnText with the same dimensions, font&font-size as your field, and your results should be the same.
In fact, I believe iText's text field rendering code uses ColumnText internally. Yep, have a look at the source for TextField.getAppearance().
Note that the bounding box of your field isn't going to match the box the text is laid out into... you have to account for borer style & thickness. That's why I suggest you look at the source.