Read custom property value from XMP with itextsharp 5.5.9 - c#

I'm having difficulties in reading a specific custom property from the XMP section of a PDF file, using itextsharp v. 5.5.9.
When I try to use the XmpReader class, it gets marked as obsolete, and it does not contain any public method that seems to be useful for reading purposes.
I can convert the Metadata section to an XML, and then parse it in some way (a workaround consists in using XmpCore library that has convenient methods for reading properties by name) but I'm sure I'm missing something...
I think it should be possible to just access some properties with just one library.
PdfReader reader = new PdfReader(inFile);
PdfStamper stamper = new PdfStamper(reader, new FileStream(outFile, FileMode.Create));
MemoryStream ms = null;
if (reader.Metadata != null)
ms = new MemoryStream(reader.Metadata);
else
{
stamper.CreateXmpMetadata();
ms = new MemoryStream();
}
XmpWriter xw = new XmpWriter(ms);
xw.XmpMeta.GetPropertyString(XmpConst.NS_DC, "MyProperty"); // -> not found, but it's ok for the first time...
xw.SetProperty(XmpConst.NS_DC, "MyProperty", "MyValue"); // -> OK
xw.XmpMeta.GetPropertyString(XmpConst.NS_DC, "MyProperty"); // -> OK
xw.Close();
stamper.XmpMetadata = ms.ToArray();
stamper.Close();
reader.Close();
If I run the program on the same file twice (so the property is saved in the file) the property is still not found..
How can I read the presence and value of MyProperty?

I ended up with this solution.
It requires XmpCore library, but it's easy and fast to implement, avoiding the explicit management of many details, such as encodings:
string result = null;
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(inFile);
if (reader.Metadata != null)
{
XmpCore.IXmpMeta meta = XmpCore.XmpMetaFactory.ParseFromBuffer(reader.Metadata);
result = meta.GetPropertyString(XmpConst.NS_DC, "MyProperty");
}
reader.Close();
return result;

If I don't get it wrong, you want to get customized property of pdf file's metadata.
If yes, you can do like this:
PdfReader reader = new PdfReader(inFile);
string myProperty = reader.Info.Where(x => x.Key == "MyProperty").Select(x => x.Value).FirstOrDefault();

Related

Can I add a CustomXmlPart to my presentation without storing the XML for it in a file?

TLDR; Please either confirm that the 2nd code snippit is the accepted method for creating a CustomXmlPart or show me another less tedious method.
So I'm trying to embed some application data in some of the elements in a .pptx that I'm modifying using the OpenXmlSDK.
To explain briefly, I need to embed an chart code into each image that is loaded into the presentation. It's so that the presentation can be re-uploaded and the charts can be generated again then replaced using the newest data.
Initially I was using Extended Attributes on the OpenXmlElement itself:
//OpenXmlDrawing = DocumentFormat.OpenXml.Drawing
// there's only one image per slide for now, so I just grab the blip which contains the image
OpenXmlDrawing.Blip blip = slidePart.Slide.Descendants<OpenXmlDrawing.Blip>().FirstOrDefault();
//then apply the attribute
blip.SetAttribute(new OpenXmlAttribute("customAttribute", null, customAttributeValue));
The issue with that being, when the .pptx is edited in PowerPoint 2013, it strips out all the Extended Attributes.
SO.
I've read in multiple places now that the solution is to use a CustomXmlPart.
So I was trying to find how to do it.. and it was looking like it would require me to have a separate file for each CustomXmlPart to feed into the part. Ex/
var customXmlPart = slidePart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (FileStream stream = new FileStream(fileName, FileMode.Open))
{
customXmlPart.FeedData(stream);
}
^ and that would need to be repeated with a different file for each CustomXmlPart. Which then means I'd likely just have to have a template file containing a skeleton custom XML part, and then dynamically fill in its contents for each individual slide before feeding it into the custom xml part.
It seems like a heck of a lot of work just to put in a little custom attribute. But I haven't been able to find any alternative methods.
Can anyone please either confirm that this is indeed the way I should do it, or point me in another direction? Greatly appreciated.
The answer is yes! :)
public class CustomXMLPropertyClass
{
public string PropertyName { get; set; }
public string PropertyValue { get; set; }
}
private static void AddCustomXmlPartCustomPropertyToSlidePart(string propertyName, string propertyValue, SlidePart part)
{
var customXmlPart = part.AddCustomXmlPart(CustomXmlPartType.CustomXml);
var customProperty = new CustomXMLPropertyClass{ PropertyName = propertyName, PropertyValue = propertyValue };
var serializer = new System.Xml.Serialization.XmlSerializer(customProperty.GetType());
var stream = new MemoryStream();
serializer.Serialize(stream, customProperty);
var customXml = System.Text.Encoding.UTF8.GetString(stream.ToArray());
using ( var streamWriter = new StreamWriter(customXmlPart.GetStream()))
{
streamWriter.Write(customXml);
streamWriter.Flush();
}
}
and then to get it back out:
private static string GetCustomXmlPropertyFromCustomXmlPart(CustomXmlPart customXmlPart)
{
var customXmlProperty = new CustomXMLPropertyClass();
string xml = "";
using (var stream = customXmlPart.GetStream())
{
var streamReader = new StreamReader(stream);
xml = streamReader.ReadToEnd();
}
using (TextReader reader = new StringReader(xml))
{
var serializer = new System.Xml.Serialization.XmlSerializer(typeof(customXmlProperty));
customXmlProperty = (CustomXMLPropertyClass)serializer.Deserialize(reader);
}
var customPropertyValue = customXmlProperty.PropertyValue;
return customPropertyValue;
}
You could also try custom properties. Custom XML files are meant for complex objects, and it sounds like you only need to store simple information.

AcroForm PDF to normal PDF in c#

I have an Acroform PDF (a PDF which can be edited) but I'm using an API to sign the PDF which requires that the PDF is a normal one and never an Acroform one.
Is there any way to transform an AcroForm PDF to a normal one?
I tried making it Read-Only but even though it cannot be edited it still is an Acroform PDF.
In answer to my comment, I assume you are using iTextSharp, even though you do not specify. Using iTextSharp, I believe you need to Flatten the form when you are done. Here is a simple example:
public void GeneratePDF(string filePath, List<PDFField> modifiedFields)
{
var pdfReader = new PdfReader(filePath);
var folderStructure = filePath.Split('\\');
if (folderStructure.Length == 0) return;
var currentFileName = folderStructure.Last();
var newFilePath = string.Format("{0}{1}", Constants.SaveFormsPath,
currentFileName.Replace(".pdf", DateTime.Now.ToString("MMddyyhhmmss") + ".pdf"));
var pdfStamper = new PdfStamper(pdfReader, new FileStream(newFilePath, FileMode.Create));
foreach (var field in modifiedFields.Where(f=>f.Value != null))
{
pdfStamper.AcroFields.SetField(field.Name, field.Value);
}
pdfStamper.FormFlattening = true;
pdfStamper.Close();
}
Ignoring the parts about the filename, it boils down to passing in some key value list regarding the field values to set. This could be where you do your signature piece, and then setting the FormFlattening property on the stamper to true.
Here is another SO post where they used a similiar technique for a slightly different issue, it may be of help: How to flatten already filled out PDF form using iTextSharp

Create PDF by copying it from template with PdfCopy (lost of data)

I'm trying to create a new pdf file based on another one using PdfCopy.
Everything work fine during generation and the generated file can be opened without any problem on my desktop, but the file seems to be corrupted and isn't accepted by the service that I must use :
SignService error when calling 'sign', probably caused by a bad file format.
I noticed that the generated pdf is always ligther than the original template, so i compared the template version with the generated one. There are some big parts of missing data, especially a whole bunch of xml. I guess PdfCopy does not copying every of my original pdf but i cannot figured out what am i missing.
here is my method :
byte[] completedDocument = null;
string originalUri = Path.Combine(this.PdfPath, pdfName);
string generatedUri = Path.Combine(this.PdfGeneratedPath, generatedPdfName);
using(MemoryStream streamCompleted = new MemoryStream())
{
using(Document doc = new Document())
{
PdfCopy copy = new PdfCopy(doc, streamCompleted);
copy.PdfVersion = PdfWriter.VERSION_1_6;
doc.Open();
copy.Open();
byte[] mergedDocument = null;
PdfReader pdfReader = new PdfReader(originalUri);
int pdfPageNumber = pdfReader.NumberOfPages;
using(MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, streamTemplate))
{
AcroFields acrofields = pdfStamper.AcroFields;
foreach (KeyValuePair<string, AcroFields.Item> field in acrofields.Fields)
{
string data;
if (pdfFieldsValues.TryGetValue(field.Key, out data))
{
if (data == null)
{
data = string.Empty;
}
acrofields.SetField(field.Key, data);
}
}
pdfStamper.FormFlattening = true;
pdfStamper.Writer.CloseStream = false;
}
mergedDocument = streamTemplate.ToArray();
}
pdfReader = new PdfReader(mergedDocument);
for (int page = 1; page <= pdfPageNumber; page++)
{
if (!excludedPages.Any(s => s == page))
{
copy.AddPage(copy.GetImportedPage(pdfReader, page));
}
}
doc.Close();
copy.Close();
}
completedDocument = streamCompleted.ToArray();
}
File.WriteAllBytes(generatedUri, completedDocument);
I tried to upload the "mergedDocument" rather than the "completedDocument" and my service accepting it, so i'm pretty sure it has something to do with this part :
for (int page = 1; page <= pdfPageNumber; page++)
{
if (!excludedPages.Any(s => s == page))
{
copy.AddPage(copy.GetImportedPage(pdfReader, page));
}
}
Or pdfCopy init
You start with a form. You fill out the form and you flatten it. By flattening it, you deliberately throw away all interactivity. I'm surprised that you're surprised that the file is getting smaller: you're throwing away the form infrastructure!
You then upload the flattened file to some service unknown to us. This service complains:
SignService error when calling 'sign', probably caused by a bad file format.
As we don't know which service you are talking about, we can only guess. An educated guess would be that the original form contains a signature field that needs to be signed by a signing service.
Obviously that field is gone: you flattened the form! I may be wrong, but I assume that the service also tries to read the fields you filled out, but that won't be possible either as you throw away all interactivity. Please remove the following line:
pdfStamper.FormFlattening = true;
Then there's Chris' comment: it seems that you're using PdfCopy. If you're using an old version of iTextSharp (before iText 5.5.1), you shouldn't expect the form to be preserved. If you are using a recent version, you should instruct PdfCopy to preserve the form (but that line is missing). You don't need to ask 'how do I preserve the form?' because you shouldn't be using PdfCopy anyway.
You only need PdfStamper. You already use PdfStamper to fill out the fields, now you can also use the selectPages() method to select the pages you want to keep (or to exclude the ones you want to remove).
Finally, it is unclear what you mean when you write:
There are some big parts of missing data, especially a whole bunch of xml.
Are you saying that the form isn't a pure AcroForm, but that it also contains an XFA stream? If so, then you most definitely can't use PdfCopy.

ITextSharp PDFTemplate FormFlattening removes filled data

I am porting an existing app from Java to C#. The original app used the IText library to fill PDF form templates and save them as new PDF's. My C# code (example) below:
string templateFilename = #"C:\Templates\test.pdf";
string outputFilename = #"C:\Output\demo.pdf";
using (var existingFileStream = new FileStream(templateFilename, FileMode.Open))
{
using (var newFileStream = new FileStream(outputFilename, FileMode.Create))
{
var pdfReader = new PdfReader(existingFileStream);
var stamper = new PdfStamper(pdfReader, newFileStream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
foreach (string fieldKey in fieldKeys)
{
form.SetField(fieldKey, "REPLACED!");
}
stamper.FormFlattening = true;
stamper.Close();
pdfReader.Close();
}
}
All works well only if I ommit the
stamper.FormFlattening = true;
line, but then the forms are visible as...forms.
When I add the this line, any values set to the form fields are lost, resulting in a blank form. I would really appreciate any advice.
Most likely you can resolve this when using iTextSharp 5.4.4 (or later) by forcing iTextSharp to generate appearances for the form fields. In your example code:
var form = stamper.AcroFields;
form.GenerateAppearances = true;
Resolved the issue by using a previous version of ITextSharp (5.4.3). Not sure what the cause is though...
I found a working solution for this for any och the newer iTextSharp.
The way we do it was:
1- Create a copy of the pdf temmplate.
2- populate the copy with data.
3- FormFlatten = true and setFullCompression
4- Combine some of the PDFs to a new document.
5- Move the new combined document and then remove the temp.
This way we got the issue with removed input and if we skipped the "formflatten" it looked ok.
However when we moved the "FormFlatten = true" from step 3 and added it as a seperate step after the moving etc was complete, it worked perfectly.
Hope I explained somewhat ok :)
In your PDF File, change the property to Visible, the Default value is Visible but not printable.

Manipulating Word 2007 Document XML in C#

I am trying to manipulate the XML of a Word 2007 document in C#. I have managed to find and manipulate the node that I want but now I can't seem to figure out how to save it back. Here is what I am trying:
// Open the document from memoryStream
Package pkgFile = Package.Open(memoryStream, FileMode.Open, FileAccess.ReadWrite);
PackageRelationshipCollection pkgrcOfficeDocument = pkgFile.GetRelationshipsByType(strRelRoot);
foreach (PackageRelationship pkgr in pkgrcOfficeDocument)
{
if (pkgr.SourceUri.OriginalString == "/")
{
Uri uriData = new Uri("/word/document.xml", UriKind.Relative);
PackagePart pkgprtData = pkgFile.GetPart(uriData);
XmlDocument doc = new XmlDocument();
doc.Load(pkgprtData.GetStream());
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("w", nsUri);
XmlNodeList nodes = doc.SelectNodes("//w:body/w:p/w:r/w:t", nsManager);
foreach (XmlNode node in nodes)
{
if (node.InnerText == "{{TextToChange}}")
{
node.InnerText = "success";
}
}
if (pkgFile.PartExists(uriData))
{
// Delete template "/customXML/item1.xml" part
pkgFile.DeletePart(uriData);
}
PackagePart newPkgprtData = pkgFile.CreatePart(uriData, "application/xml");
StreamWriter partWrtr = new StreamWriter(newPkgprtData.GetStream(FileMode.Create, FileAccess.Write));
doc.Save(partWrtr);
partWrtr.Close();
}
}
pkgFile.Close();
I get the error 'Memory stream is not expandable'. Any ideas?
I would recommend that you use Open XML SDK instead of hacking the format by yourself.
Using OpenXML SDK 2.0, I do this:
public void SearchAndReplace(Dictionary<string, string> tokens)
{
using (WordprocessingDocument doc = WordprocessingDocument.Open(_filename, true))
ProcessDocument(doc, tokens);
}
private string GetPartAsString(OpenXmlPart part)
{
string text = String.Empty;
using (StreamReader sr = new StreamReader(part.GetStream()))
{
text = sr.ReadToEnd();
}
return text;
}
private void SavePart(OpenXmlPart part, string text)
{
using (StreamWriter sw = new StreamWriter(part.GetStream(FileMode.Create)))
{
sw.Write(text);
}
}
private void ProcessDocument(WordprocessingDocument doc, Dictionary<string, string> tokenDict)
{
ProcessPart(doc.MainDocumentPart, tokenDict);
foreach (var part in doc.MainDocumentPart.HeaderParts)
{
ProcessPart(part, tokenDict);
}
foreach (var part in doc.MainDocumentPart.FooterParts)
{
ProcessPart(part, tokenDict);
}
}
private void ProcessPart(OpenXmlPart part, Dictionary<string, string> tokenDict)
{
string docText = GetPartAsString(part);
foreach (var keyval in tokenDict)
{
Regex expr = new Regex(_starttag + keyval.Key + _endtag);
docText = expr.Replace(docText, keyval.Value);
}
SavePart(part, docText);
}
From this you could write a GetPartAsXmlDocument, do what you want with it, and then stream it back with SavePart(part, xmlString).
Hope this helps!
You should use the OpenXML SDK to work on docx files and not write your own wrapper.
Getting Started with the Open XML SDK 2.0 for Microsoft Office
Introducing the Office (2007) Open XML File Formats
How to: Manipulate Office Open XML Formats Documents
Manipulate Docx with C# without Microsoft Word installed with OpenXML SDK
The problem appears to be doc.Save(partWrtr), which is built using newPkgprtData, which is built using pkgFile, which loads from a memory stream... Because you loaded from a memory stream it's trying to save the document back to that same memory stream. This leads to the error you are seeing.
Instead of saving it to the memory stream try saving it to a new file or to a new memory stream.
The short and simple answer to the issue with getting 'Memory stream is not expandable' is:
Do not open the document from memoryStream.
So in that respect the earlier answer is correct, simply open a file instead.
Opening from MemoryStream editing the document (in my experience) easy lead to 'Memory stream is not expandable'.
I suppose the message appears when one do edits that requires the memory stream to expand.
I have found that I can do some edits but not anything that add to the size.
So, f.ex deleting a custom xml part is ok but adding one and some data is not.
So if you actually need to open a memory stream you must figure out how to open an expandable MemoryStream if you want to add to it.
I have a need for this and hope to find a solution.
Stein-Tore Erdal
PS: just noticed the answer from "Jan 26 '11 at 15:18".
Don't think that is the answer in all situations.
I get the error when trying this:
var ms = new MemoryStream(bytes);
using (WordprocessingDocument wd = WordprocessingDocument.Open(ms, true))
{
...
using (MemoryStream msData = new MemoryStream())
{
xdoc.Save(msData);
msData.Position = 0;
ourCxp.FeedData(msData); // Memory stream is not expandable.

Categories