Hi I am using PDF sharp to print user input onto positions in a template document.
The data (fields) are collected from user (web page) and written at appropriate positions on the document using drawstring method.
Currently I am finding the Pixel position of each templated field in each page by trial and error .It would be much easier if there is a way to determine pixel position of each field in pdf page.
Any suggestion would be most helpful.
thanks
I had the same issue as you, josephj1989, and I found what I believe to be our answer.
According to this page in the PDFSharp documentation,
The current implementation of PDFsharp has only one layout of the graphics context. The origin (0, 0) is top left and coordinates grow right and down. The unit of measure is always point (1/72 inch).
So, say I want to draw an image 3 inches from the left and 7.5 inches from the top of an 8.5 x 11 page, then I would do something like this:
PdfDocument document = GetBlankPdfDocument();
PdfPage myPage = document.AddPage();
myPage.Orientation = PdfSharp.PageOrientation.Portrait;
myPage.Width = XUnit.FromInch(8.5);
myPage.Height = XUnit.FromInch(11);
XImage myImage = GetSampleImage();
double myX = 3 * 72;
double myY = 7.5 * 72;
double someWidth = 126;
double someHeight = 36;
XGraphics gfx = XGraphics.FromPdfPage(myPage);
gfx.DrawImage(myBarcode, myX, myY, someWidth, someHeight);
I tested this out myself with a barcode image, and I found that when you use their formula for measuring positioning, you can get a level of precision at, well, 1/72 of an inch. That's not bad.
So, at this point, it'd be good to create some kind of encapsulation for such measurements so that we're focused mostly on the task at hand and not the details of conversion.
public static double GetCoordinateFromInch(double inches)
{
return inches * 72;
}
We could go on to make other helpers in this way...
public static double GetCoordinateFromCentimeter(double centimeters)
{
return centimeters * 0.39370 * 72;
}
With such helper methods, we could do this with the previous sample code:
double myX = GetCoordinateFromInch(3);
double myY = GetCoordinateFromInch(7.5);
double someWidth = 126;
double someHeight = 36;
XGraphics gfx = XGraphics.FromPdfPage(myPage);
gfx.DrawImage(myBarcode, myX, myY, someWidth, someHeight);
I hope this is helpful. I'm sure you will write cleaner code than what is in my example. Additionally, there are probably much smarter ways to streamline this process, but I just wanted to put something here that would make immediate use of what we saw in the documentation.
Happy PDF Rendering!
When I used PDF Sharp, my approach was to make use the XUnit struct and to reference the top left point of the document as my starting point for X/Y positions.
Obviously referencing the top left point of the document (0,0) for every element on a PdfPage will get messy. To combat this, I used the XRect class to create rectangles for elements to sit within. Once the XRect is drawn onto the page, you are then able to reference the X/Y position of the rectange via the XRect's properties. Then with some basic maths using those coordinates and the width/height of the XRect, you should be able to calculate the coordinates for the position of the next element you want to add to the PdfPage.
Follow this code sample, I've provided a rough sketch of what the end result would be. The code is untested but is very heavily based on code in production right now.
// Create a new PDF document
PdfDocument document = new PdfDocument();
// Create an empty page with the default size/orientation
PdfPage page = document.AddPage();
page.Orientation = PageOrientation.Landscape;
page.Width = XUnit.FromMillimeter(300);
page.Height = XUnit.FromMillimeter(200);
// Get an XGraphics object for drawing
XGraphics gfx = XGraphics.FromPdfPage(page);
// Add the first rectangle
XUnit horizontalOffset = XUnit.FromMillimeter(5);
XUnit verticalOffset = XUnit.FromMillimeter(5);
XUnit columnWidth = XUnit.FromMillimeter(100);
XUnit columnHeight = page.Height - (2 * verticalOffset);
XRect columnRect = new XRect(horizontalOffset, verticalOffset, columnWidth, columnHeight);
gfx.DrawRectangle(XBrushes.Teal, columnRect);
// Insert an image inside the rectangle, referencing the Left and Top properties of the rectangle for image placement
XImage topLogo = XImage.FromFile(GetFilePath(#"content\img\pdfs\standard\logo-no-strapline.jpg")); // GetFilePath is a private method, not shown for brevity
gfx.DrawImage(topLogo,
columnRect.Left + XUnit.FromMillimeter(5),
columnRect.Top + XUnit.FromMillimeter(5),
columnRect.Width - XUnit.FromMillimeter(10),
XUnit.FromMillimeter(38));
And the output:
Lastly, I'm sure you're aware, but there's a good resource of PdfSharp samples here.
PDF files have no pixels and no pixel positions.
You can print PDF pages (use the "Actual size" size option) and measure positions with a ruler (this works pretty good with our printers).
Or you can use Adobe Acrobat to measure the positions of items on the PDF pages.
A bit of trial and error remains as you may have to give or take half a millimeter.
Depending on your ruler you can use XUnit.FromMillimeter or XUnit.FromInch to get the point positions for PDFsharp (but points are no pixels).
The current PDFsharp samples are here:
http://www.pdfsharp.net/wiki/PDFsharpSamples.ashx
Related
When I use the pdfCanvas object I have the MoveText method where I can set the x and y coordinate but I don't see that in the Paragraph object? Second thing is why do I need the rectangle object I am not adding any Rectangle to the pdf just Text. Where I want the text to take the full width of the page. Can I get the size of the font and text and then calculate the centeredWidth and centeredHeight if I only have MoveText method in pdfCanvas?
for (int i = 1; i <= numberOfPages; i++)
{
PdfPage pdfPage = pdfDocument.GetPage(i);
iText.Kernel.Geom.Rectangle pageSizeWithRotation = pdfPage.GetPageSizeWithRotation();
float n2 = 15F;
float n3 = pageSizeWithRotation.GetHeight() - 10F;
float frontSize = 6.25f;
PdfCanvas pdfCanvas = new PdfCanvas(pdfPage);
iText.Kernel.Geom.Rectangle rectangle = new iText.Kernel.Geom.Rectangle(100, 100, 100, 100);
Canvas canvas = new Canvas(pdfCanvas, rectangle);
PdfFont font = PdfFontFactory.CreateFont("C:\\Windows\\Fonts\\ARIALN.TTF");
Paragraph p = new Paragraph()
.Add(disclaimerText)
.SetFont(font)
.SetFontSize(frontSize)
.SetTextAlignment(TextAlignment.CENTER);
canvas.Add(p);
canvas.Close();
//pdfCanvas.BeginText()
// .SetFillColorRgb(0, 0, 0)
// .SetFontAndSize(PdfFontFactory.CreateFont("C:\\Windows\\Fonts\\ARIALN.TTF"), frontSize)
// .MoveText(n2, n3)
// .ShowText(disclaimerText)
// .EndText();
}
First of all, please be aware that you are using different parts of the iText API with PdfCanvas on one side and Canvas on the other side:
PdfCanvas is merely a thin wrapper for the instructions written into the PDF. When using this class, you have to determine yourself where you want to start text lines, where to break the lines, how much space to add between characters, words, and lines, and so on.
Canvas (and Document) on the other hand features its own layout engine, you only initialize it with the PdfCanvas to work on and the coordinate ranges in which you want it to operate, and then you feed it paragraphs, tables, etc. which Canvas arranges properly.
Thus, you essentially have the choice, do you foremost want to arrange everything yourself, or do you foremost want to leave that task to iText.
That being said, let's look at your questions:
When I use the pdfCanvas object I have the MoveText method where I can set the x and y coordinate but I don't see that in the Paragraph object?
Paragraphs mostly are designed for automatic layout by Canvas and Document. Nonetheless, you can statically arrange them at given coordinates using the SetFixedPosition overloads.
Second thing is why do I need the rectangle object I am not adding any Rectangle to the pdf just Text.
You need the rectangle to tell Canvas where on the (theoretically endless) PdfCanvas coordinate plane it shall arrange the objects you give it.
Where I want the text to take the full width of the page.
Then use the full crop box of that page, assuming you mean the full visible width of the page on screen or the final printed product. You can get it using the PdfPage method GetCropBox.
If you are not sure which "full widths of the page" there are, have a look at this answer.
Can I get the size of the font and text and then calculate the centeredWidth and centeredHeight if I only have MoveText method in pdfCanvas?
You don't get the size of the font, you set it (using SetFontAndSize as you show in your code). You also set the font. And the PdfFont object you set has nice GetWidth overloads to determine the widths of some text drawn using that font. That in combination with the crop box mentioned above and trivial math allows you to calculate everything you need for simple text drawing.
I need to print an image on a PDF file, that when printed, is exactly 80 mm high. I know the page sizes, I know the DPI of the image I am putting on the PDF. But when i print it, it comes out at 78.5 mm... and the task I am doing needs to be exact.
I load the image from disk, I know the DPI and I know the pixel height/width. I load it into an Image object (setting the vertical and horizontal dpi to 300). And then add it to he PDF.
public static void SavePdf(Image img, string filename)
{
// Create a new PDF document
PdfDocument document = new PdfDocument();
document.Info.Title = "Test";
// Create an empty page
PdfPage page = document.AddPage();
page.Width = SharedMethods.MmToPixel(520);
page.Height = SharedMethods.MmToPixel(110);
// Get an XGraphics object for drawing
XGraphics gfx = XGraphics.FromPdfPage(page);
XImage image = XImage.FromGdiPlusImage(img);
gfx.DrawImage(image, 0, 0, image.PixelWidth, image.PixelHeight);
// Save the document...
document.Save(filename);
}
The image has to start at the top left of the page.
But it seems to start a few mm in from the top, and seems to reduce in size from 80 mm to 78.5.
I heard PDFs resize by 98% due to ... something. Can this be the issue and I need to upscale my image to 102%?
Note, I know the mm size I want, so I convert that to pixels based on my DPI and a constant I found online:
const double milimetresPerInch = 25.4; // as one inch is 25.4 mm
const double dpi = 300;
public static int MmToPixel(double mm)
{
double pixel = mm * dpi / milimetresPerInch;
return (int)Math.Round(pixel);
}
Edit:
I am using the XUnit.FromMillimeter now, but is this correct?
page.Width = XUnit.FromMillimeter(520);
page.Height = XUnit.FromMillimeter(110);
I am then loading the image, which is the same size as the paper size (in mm) like this:
XImage image = XImage.FromGdiPlusImage(img);
gfx.DrawImage(image, 0, 0, image.PixelWidth, image.PixelHeight);
Is that right? It should fit the entire page, but I notice I am using pixels, but maybe I need to use FromPixels or somehow, FromMillimteres?
I heard PDFs resize by 98% due to ... something.
Sources for that info?
Printers often have non-print areas, typically a few mm on each side with modern printers. By default Adobe Reader reduces the page size while printing to fit the whole page into the printing area. Disable automatic scaling when printing with Adobe Reader.
If the image starts at the top left position, parts of the image may be cut-off due to the non-printing area.
Adobe Reader can measure items on the PDF page. Thus you can easily verify whether your image has the dimensions it should have, This will tell you whether there is a problem with printing or with embedding the image into the PDF.
Adobe Reader can also show you the size of the page - maybe the page has a different mm size than you expect.
You should use PDFsharp's methods like Unit.FromMillimeter to convert units. Your method MmToPixel may do more harm than good. PDF pages do not have pixel and measurements are in points.
I am using PDFBOX and itextsharp dll and processing a pdf.
so that I get the text coordinates of the text within a rectangle. the rectangle coordinates are extracted using the itextsharp.dll.
Basically I get the rectangle coordinates from itextsharp.dll, where itextsharp uses the coordinates system as lower left. And I get the pdf page text from PDFBOX, where PDFBOX uses the coordinates system as top upper left.
I need help in converting the Coordinates from lower left to upper left
Updating my question
Pardon me if you didn't understood my question and if not full information was provided.
well, Let me try to give more details from start.
I am working on a tool where I get a PDF in which a rectangle is drawn using some Drawing markups within a comment section. Now I am reading the rectangle coordinates using iTextsharp
PdfDictionary pageDict = pdReader.GetPageN(page_no);
PdfArray annotArray = pageDict.GetAsArray(PdfName.ANNOTS);
where pdReader is PdfReader.
And the page text along with its coordinates is extracted using PDFBOX. where as I have a class created pdfBoxTextExtraction in this I process the text and coordinate such that it returns the text and llx,lly,urx,ury "line by line" please note line by line not sentence wise.
So I want to extract the text that lays within the Rectangle coordinates. I got stuck when the coordinates of the rectangle returned from itextsharp i.e llx,lly,urx,ury of a rectangle has an origin at lower left where as the text coordinates returned from PDFBOX has an origin at upper left .then I realised I need to adjust the y-axis so that the origin moves from lower left to upper left. for the I got the height of the page and height of the cropbox
iTextSharp.text.Rectangle mediabox = reader.GetPageSize(page_no);
iTextSharp.text.Rectangle cropbox = reader.GetCropBox(page_no);
Did some basic adjustment
lly=mediabox.Top - lly
ury=mediabox.Top - ury
in some case the adjustment worked, whereas in some PDFs needed to do adjustment on cropbox
lly=cropbox .Top - lly
ury=cropbox .Top - ury
where as on some PDFs didn't worked.
All I need is help in adjusting the rectangle coordinates so that I get the text within the rectangle.
The coordinate system in PDF is defined in ISO-32000-1. This ISO standard explains that the X-axis is oriented towards the right, whereas the Y-axis has an upward orientation. This is the default. These are the coordinates that are returned by iText (behind the scenes, iText resolves all CTM transformations).
If you want to transform the coordinates returned by iText so that you get coordinates in a coordinate system where the Y axis has a downward orientation, you could for instance subtract the Y value returned by iText from the Y-coordinate of the top of the page.
An example: Suppose that we are dealing with an A4 page, where the Y coordinate of the bottom is 0 and the Y coordinate of the top is 842. If you have Y coordinates such as y1 = 806 and y2 = 36, then you can do this:
y = 842 - y;
Now y1 = 36 and y2 = 806. You have just reversed the orientation of the Y-axis using nothing more than simple high-school math.
Update based on an extra comment:
Each page has a media box. This defines the most important page boundaries. Other page boundaries may be present, but none of them shall exceed the media box (if they do, then your PDF is in violation with ISO-32000-1).
The crop box defines the visible area of the page. By default (for instance if a crop box entry is missing), the crop box coincides with the media box.
In your comment, you say that you subtract llx from the height. This is incorrect. llx is the lower-left x coordinate, whereas the height is a property measured on the Y axis, unless the page is rotated. Did you check if the page dictionary has a /Rotate value?
You also claim that the values returned by iText do not match the values returned by PdfBox. Note that the values returned by iText conform with the coordinate system as defined by the ISO standard. If PdfBox doesn't follow this standard, you should ask the people from PdfBox why they didn't follow the standard, and what coordinate system they are using instead.
Maybe that's what mkl's comment is about. He wrote:
Y' = Ymax - Y. X' = X - Xmin.
Maybe PdfBox searches for the maximum Y value Ymax and the minimum X value Xmin and then applies the above transformation on all coordinates. This is a useful transformation if you want to render a PDF, but it's unwise to perform such an operation if you want to use the coordinates, for instance to add content at specific positions relative to text on the page (because the transformed coordinates are no longer "PDF" coordinates).
Remark:
You say you need PdfBox to get the text of a page. Why do you need this extra tool? iText is perfectly capable of extracting and reordering the text on a page (assuming that you use the correct extraction strategy). If not, please clarify.
Note that we recently decided to support Type3 fonts, although we weren't convinced that this makes sense (see Text extraction is empty and unknown for text has type3 font using PDFBox,iText (difficult topic!) to understand why not).
What some consider "wrong extraction" can often be "wrong interpretation" of what is extracted as explained in this mailing-list answer: http://thread.gmane.org/gmane.comp.java.lib.itext.general/66829/focus=66830
There are other cases where we follow the spec, leading to results that are different than what PdfBox returns. Watch https://www.youtube.com/watch?v=wxGEEv7ibHE for more info.
if ((mediabox.Top - mediabox.Height) != 0)
{
topY = mediabox.Top;
heightY = mediabox.Height;
diffY = topY - heightY;
lly_adjust = (topY - ury) + diffY;
ury_adjust = (topY - lly) + diffY;
}
else if ((cropbox.Top - cropbox.Height) != 0)
{
topY = mediabox.Top;
heightY = cropbox.Top;
diffY = topY - heightY;
lly_adjust = (topY - ury) - diffY;
ury_adjust = (topY - lly) - diffY;
}
else
{
lly_adjust = mediabox.Top - ury;
ury_adjust = mediabox.Top - lly;
}
These are final adjustment done
How can i get the orientation of a page within a pdf document in .NET?
A pdf document may contain portrait and landscape pages... Rigth?
Any help would be gratefully appreciated.
Using iTextSharp you can do this pretty easily:
''//File to test
Dim TestFileName = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf")
''//Create an object to read the PDF
Dim Reader As New iTextSharp.text.pdf.PdfReader(TestFileName)
''//Get the page size of the first page (iTextSharp page numbers start at 1)
Dim Rect = Reader.GetPageSize(1)
''//Compare the dimensions of the rectangle returned. For simplicity I'm saying that a square object is portraint, too
Dim IsPortrait = Rect.Height >= Rect.Width
With straight forward approaches, you'll get about 95% of the way there. You'll need the page dimensions, which you can get from the MediaBox, but really you want the CropBox if it exists because it can crop a portrait page into a landscape page (or vice versa). In addition, you need to look at the Rotation entry in the page dictionary because a page could be rotated in any of the compass points. And just to make life particularly interesting, the content of the page could be rendered in any orientation. You could have an "upright" portrait page with the text drawn upside down.
I am trying to print an image (from file) to a the printer using a PrintDocument.
I am re-sizing my image so that it is scaled to be full page on the printout when I print this the image is cropped slightly.
EDIT 2
I am using the margins to calculate the area to use:
With printSettings.DefaultPageSettings
Dim areaWidth As Single = .Bounds.Width - .Margins.Left - .Margins.Right
Dim areaHeight As Single = .Bounds.Height - .Margins.Top - .Margins.Bottom
End With
The bounds of the page are 1169x827 (A4) and with the margins it is 1137x795.
After resize my image size is 1092x682 and I am using the following code to draw it:
e.Graphics.DrawImage(printBitmap, .Margins.Left, .Margins.Top)
The annoying thing is that when I print to the PrintPreviewDialog it is scaled perfectly but when I print the exact same code to the actual printer it does not fit.
EDIT 3
Full code can be found at this url
Usage:
Dim clsPrint As New clsPrinting
With clsPrint
.Landscape = True
.SetMinimumMargins()
If .ShowPrintDialog Then
.Documentname = "Some doc name"
.Preview = False 'When True shows ok
.PrintImage("filename of a png file")
End If
End With
Try using e.graphics.VisibleClipBounds for the printable page size in the PrintPage function. As Hans said, it's better not to resize your image before printing.
You must work with MarginBounds:
in C#:
e.Graphics.DrawImage(your_image, e.MarginBounds);
in C++/CLI:
e->Graphics->DrawImage(your_image, e->MarginBounds);
Note: If your image doesn't have the same aspect ratio you'll need to adjust. In this example width of the image exceeded the page width:
Dim adjustment As Double = img.Width / e.MarginBounds.Width
e.Graphics.DrawImage(img, New Rectangle(New Point(0, 0), New Point(img.Width / adjustment, img.Height / adjustment)))
Sounds like you are wanting to print a page with a full bleed which most personal printers are not capable of. As one of the comments above mentions, factor in the margins to re-size the image to the appropriate size.
I did not find a solution to this problem. I worked around it by using the printer margins when performing a print preview and ignoring the margins (start at 0,0 origin) when actually printing. I believe that this is possibly a bug in the printer driver? But I can't confirm.