I'll make it short. I need to reorder the logical structure elements in a pdf using iText 7 because of the order for screen readers. Is there any way to reorder existing logical structure elements?
I already tried getting the PdfStructTreeRoot but it only has methods for adding new kids.
PdfDocument pdfDoc = new PdfDocument(new PdfReader(inputFile), new PdfWriter(outputFile));
PdfStructTreeRoot rootTree = pdfDoc.GetStructTreeRoot();
Is there a build-in way to reorder the elements?
Related
I am developing a program with C# and the library PDFSharp.
I am currently using the following code to get the X and Y coordinates of a specific AcroField in a PDF document:
PdfTextField imageField = (PdfTextField)inForm.Fields[elementName];
PdfRectangle rect = imageField.Elements.GetRectangle(PdfAnnotation.Keys.Rect);
This works fine if there is only 1 Field with the same name present in the PDF Document. However, if there are two fields both named "FirstName", even if they are on separate pages, this seems to remove the "/Rect" and "/P" flags, so I cannot use these to find the position or the page relevant to that field.
Is there any other way to get the position of a Field in the PDF, or any way to activate the "/Rect" and "/P" flags?
Thanks, RBrNx
What Mihai posted fits what I discovered from reverse engineering the PDF via PdfSharp. If there are multiple fields in the same document, they are nested under a parent container, and it is a reference to this parent container which PdfSharp will give you when using the AcroForm.Fields accessor. To get the Page and Rectangle elements for each field, you have to look at the children of that container.
To get the values you are looking for, you'll want to do something like this:
PdfTextField imageField = (PdfTextField)inForm.Fields[elementName];
var fieldRectangles = new List<PdfRectangle>();
if( imageField.HasKids )
{
PdfArray kids = (PdfArray) Elements[Keys.Kids];
foreach( var kid in kids )
{
var kidValues = ((PdfReference) kid).Value as PdfDictionary;
var rectangle = kidValues.Elements.GetRectangle(PdfAnnotation.Keys.Rect);
fieldRectangles.Add(rectangle);
}
}
The page reference element ("/P" tag) is also available from these "Kid" elements.
I'm not familiar with PDFSharp API but this is how it works in PDF:
- form fields have document scope and not page scope.
- 2 or more fields with the same name are in fact a single field with 2 or more widgets (widget annotations, the visual representation of a field). The /Rect and /P entries are stored at widget level. When the field has one widget, the widget is merged with the field so the /Rect and /P entries appear to be part of the field.
In your scenario you have to look for the /Kids key which is an array. Drill down through the /Kids array (a child can have his own kids and so on) till the last level where the /Kids is no longer present. At this level you should find the /Rect and /P keys.
Each widget can have its own /Rect and /P keys since they can appear on different pages at different positions.
I have to merge 2 pdfs forms into one. The input pdfs have fillable fields, and the output should also have the same fields. I was able to achieve this, but, when I merge PDFs with same value for fields, only the first field is coming aas a field and second one is flattened. (Lets say pdf 1 has fields 'Name' and 'Comment1'; pdf 2 has fields, 'Name' and 'Comment2'; when I merge, in the output pdf, 2nd 'Name' field is flattend.)
_stamp = new PdfStamper(_reader, pdfStream);
AcroFields fields = _stamp.AcroFields;
if (!(fields == null))
{
_stamp.FormFlattening = false;
}
_stamp.Close();
_stamp = null;
In your code, you are using PdfStamper. That's a class to fill out forms, not to merge them. Merging forms is done using PdfCopy:
public void createPdf(String filename, PdfReader[] readers) throws IOException, DocumentException {
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(filename));
copy.setMergeFields();
document.open();
for (PdfReader reader : readers) {
copy.addDocument(reader);
}
document.close();
for (PdfReader reader : readers) {
reader.close();
}
}
You can find the full code samples here. You'll have to adapt it to C# (the method names are slightly different, but the code is similar).
It is very important that you don't forget to tell PdfCopy that you want to merge the fields, otherwise the form will not be copied.
You explain that you have a field named Name in one PDF and a field named Name in the other. If you merge both forms, this will result in a single field Name with only one value. You can't have a field Name on one page with one value and a field Name on another page with another value. That's why we also provide a sample where the fields are renamed. You can find that example here. You probably don't need that example; I'm only adding it for the sake of completeness.
Howdie,
Having some issues with implementing a video like system in umbraco and was wondering if any uber smart people were willing to make me feel dumb(learn something) and point me in the right direction.
The problem:
As I have edited properties on the documents before I decided to create a custom media type with an int “likes” property. I would then increment this if the user hasn’t liked this video before on post back or disable the button if they have.
I imagined doing something like this:
Document doc = new Document(mediaItemId);
int curValue = doc.getProperty("likes").Value;
doc.getProperty("likes").Value = (curValue + 1);
doc.Save();
http://our.umbraco.org/wiki/reference/api-cheatsheet/modifying-document-properties
The issue arose when I discovered that umbraco treats the document types and media types differently and the code I was using previously (insert code) no longer works.
Been hacking away for some time and the only two possibilities I have left I don't really want to do. The first being to create a new media item, copy over the properties and then "save over" the original in the db, the other is to create a custom table and not worry about the umbraco API.
http://our.umbraco.org/documentation/Reference/management/Media/
I am sure there has to be an easier way to do this (hoping I am being thick).
Thanks for taking the time to read and respond!
you should be able to exactly what you've already done but replace the line:
Document doc = new Document(mediaItemId);
with
Media doc = new Media(mediaItemId);
You will of course have to make sure that your Media type has a "likes" property. This can be done in the "Settings > Media types" section of Umbraco in the same way that you can add properties to document types.
I have this code that creates a new Visio document and adds a rectangle. It works, but I don't like having to open another document to get the Masters collection from it. The issue is the new document has an empty Masters shape collection. I couldn't find a method in the Document class to add shapes to the Masters collection and all the examples I could find for adding shapes assumed you had an existing document. Is there a better way to do what I want?
// create the new application
Visio.Application va = new Microsoft.Office.Interop.Visio.Application();
// add a document
va.Documents.Add(#"");
// Visio.Documents vdocs = va.Documents;
// we need this document to get its Masters shapes collection
// since our new document has none
Visio.Document vu = vdocs.OpenEx(#"C:\Program Files (x86)\Microsoft Office\Office12\1033\Basic_U.vss", (short)Microsoft.Office.Interop.Visio.VisOpenSaveArgs.visOpenDocked);
// set the working document to our new document
Visio.Document vd = va.ActiveDocument;
// set the working page to the active page
Microsoft.Office.Interop.Visio.Page vp = va.ActivePage;
// if we try this from the Masters collection from our new document
// we get a run time since our masters collection is empty
Visio.Master vm = vu.Masters.get_ItemU(#"Rectangle");
Visio.Shape visioRectShape = vp.Drop(vm, 4.25, 5.5);
visioRectShape.Text = #"Rectangle text.";
You're right - the Masters collection is ReadOnly. Documents normally start off with an empty masters collection. The collection gets populated by dropping masters from a stencil document.
If you want to create a new document with a pre-populated Masters collection then you could create your own template (.vst) and then base your new document on that. For example:
Visio.Document vDoc = vDocs.Add("MyTemplateFile.vst");
Normally you would package your stencils and templates together and then always create shapes by dropping a master from the respective stencil document (.vss).
Masters also have a MatchByName property. Dropping a master when this property is set to true, Visio first checks that a master of the same exists in the drawing document masters collection. If it does an instance of that master will be dropped. If not a new master will be added based on the original stencil. Have a look at these two links for more information:
http://msdn.microsoft.com/en-us/library/aa201768%28office.10%29.aspx
http://msdn.microsoft.com/en-us/library/ff766298.aspx
If you really want to create your own masters in code, you can draw / drop your own shapes on the page and then use the Document.Drop method to add it to the masters collection.
Also if you want to use a master by name then you'll need to loop through the masters collection to check that it exists before you use it.
I think you will find this on-line book extremely useful : http://msdn.microsoft.com/en-us/library/aa245244(v=office.10).aspx
This code
XmlDataDocument xmlDataDocument = new XmlDataDocument(ds);
does not work for me, because the node names are derived from the columns' encoded ColumnName property and will look like "last_x20_name", for instance. This I cannot use in the resulting Excel spreadsheet. In order to treat the column names to make them something more friendly, I need to generate the XML myself.
I like LINQ to XML, and one of the responses to this question contained the following snippets:
XDocument doc = new XDocument(new XDeclaration("1.0","UTF-8","yes"),
new XElement("products", from p in collection
select new XElement("product",
new XAttribute("guid", p.ProductId),
new XAttribute("title", p.Title),
new XAttribute("version", p.Version))));
The entire goal is to dynamically derive the column names from the dataset, so hardcoding them is not an option. Can this be done with Linq and without making the code much longer?
It ought to be possible.
In order to use your Dataset as a source you need Linq-to-Dataset.
Then you would need a nested query
// untested
var data = new XElement("products",
from row in ds.Table["ProductsTable"].Rows.AsEnumerable()
select new XElement("product",
from column in ds.Table["ProductsTable"].Columns // not sure about this
select new XElement(colum.Fieldname, rows[colum.Fieldname])
) );
I appreciate the answers, but I had to abandon this approach altogether. I did manage to produce the XML that I wanted (albeit not with Linq), but of course there is a reason why the default implementation of the XmlDataDocument constructor uses the EncodedColumnName - namely that special characters are not allowed in element names in XML. But since I wanted to use the XML to convert what used to be a simple CSV file to the XML Spreadsheet format using XSLT (customer complains about losing leading 0's in ZIP codes etc when loading the original CSV into Excel), I had to look into ways that preserve the data in Excel.
But the ultimate goal of this is to produce a CSV file for upload to the payroll processor, and they mandate the column names to be something that is not XML-compliant (e.g. "File #"). The data is reviewed by humans before the upload, and they use Excel.
I resorted to hard-coding the column names in the XSLT after all.