I've seen many posts that have helped me get to where I am, I'm new to programming. My intention is to get the files within the directory "sourceDir" and look for a Regex Match. When it finds a Match, I want to create a new file with the Match as the name. If the code finds another file with the same Match (the file already exists) then create a new page within that document.
Right now the code works, however instead of adding a new page, it overwrites the first page of the document. NOTE: Every document in the directory is only one page!
string sourceDir = #"C:\Users\bob\Desktop\results\";
string destDir = #"C:\Users\bob\Desktop\results\final\";
string[] files = Directory.GetFiles(sourceDir);
foreach (string file in files)
{
using (var pdfReader = new PdfReader(file.ToString()))
{
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
var text = new StringBuilder();
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
var currentText =
PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
Regex reg = new Regex(#"ABCDEFG");
MatchCollection matches = reg.Matches(currentText);
foreach (Match m in matches)
{
string newFile = destDir + m.ToString() + ".pdf";
if (!File.Exists(newFile))
{
using (PdfReader reader = new PdfReader(File.ReadAllBytes(file)))
{
using (Document doc = new Document(reader.GetPageSizeWithRotation(page)))
{
using (PdfCopy copy = new PdfCopy(doc, new FileStream(newFile, FileMode.Create)))
{
var importedPage = copy.GetImportedPage(reader, page);
doc.Open();
copy.AddPage(importedPage);
doc.Close();
}
}
}
}
else
{
using (PdfReader reader = new PdfReader(File.ReadAllBytes(newFile)))
{
using (Document doc = new Document(reader.GetPageSizeWithRotation(page)))
{
using (PdfCopy copy = new PdfCopy(doc, new FileStream(newFile, FileMode.OpenOrCreate)))
{
var importedPage = copy.GetImportedPage(reader, page);
doc.Open();
copy.AddPage(importedPage);
doc.Close();
}
}
}
}
}
}
}
}
Bruno did a great job explaining the problem and how to fix it but since you've said that you are new to programming and you've further posted a very similar and related question I'm going to go a little deeper to hopefully help you.
First, let's write down the knowns:
There's a directory full of PDFs
Each PDF has only a single page
Then the objectives:
Extract the text of each PDF
Compare the extracted text with a pattern
If there's a match, then using the match for a file name do one of:
If a file exists append the source PDF to it
If there isn't a match, create a new file with the PDF
There's a couple of things that you need to know before proceeding. You tried to work in "append mode" by using FileMode.OpenOrCreate. It was a good guess but incorrect. The PDF format has both an beginning and an end, so "start here" and "end here". When you attempt to append another PDF (or anything for that matter) to an existing file you are just writing past the "end here" section. At best, that's junk data that gets ignored but more likely you'll end up with a corrupt PDF. The same is true of almost any file format. Two XML files concatenated is invalid because an XML document can only have one root element.
Second but related, iText/iTextSharp cannot edit existing files. This is very important. It can, however, create brand new files that happen to have the exact or possibly modified versions of other files. I don't know if I can stress how important this is.
Third, you are using a line that get's copied over and over again but is very wrong and actually can corrupt your data. For why it is bad, read this.
currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
Fourth, you are using RegEx which is an overly complicated way to perform a search. Maybe the code that you posted was just a sample but if it wasn't I would recommend just using currentText.Contains("") or if you need to ignore case currentText.IndexOf( "", StringComparison.InvariantCultureIgnoreCase ). For the benefit of the doubt, the code below assumes you have a more complex RegEx.
With all that, below is a full working example that should walk you through everything. Since we don't have access to your PDFs, the second section actually creates 100 sample PDFs with our search terms occasionally added to them. Your real code obviously wouldn't do this but we need common ground to work with you on. The third section is the search and merge feature that you are trying to do. Hopefully the comments in the code explain everything.
/**
* Step 1 - Variable Setup
*/
//This is the folder that we'll be basing all other directory paths on
var workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//This folder will hold our PDFs with text that we're searching for
var folderPathContainingPdfsToSearch = Path.Combine(workingFolder, "Pdfs");
var folderPathContainingPdfsCombined = Path.Combine(workingFolder, "Pdfs Combined");
//Create our directories if they don't already exist
System.IO.Directory.CreateDirectory(folderPathContainingPdfsToSearch);
System.IO.Directory.CreateDirectory(folderPathContainingPdfsCombined);
var searchText1 = "ABC";
var searchText2 = "DEF";
/**
* Step 2 - Create sample PDFs
*/
//Create 100 sample PDFs
for (var i = 0; i < 100; i++) {
using (var fs = new FileStream(Path.Combine(folderPathContainingPdfsToSearch, i.ToString() + ".pdf"), FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
//Add a title so we know what page we're on when we combine
doc.Add(new Paragraph(String.Format("This is page {0}", i)));
//Add various strings every once in a while.
//(Yes, I know this isn't evenly distributed but I haven't
// had enough coffee yet.)
if (i % 10 == 3) {
doc.Add(new Paragraph(searchText1));
} else if (i % 10 == 6) {
doc.Add(new Paragraph(searchText2));
} else if (i % 10 == 9) {
doc.Add(new Paragraph(searchText1 + searchText2));
} else {
doc.Add(new Paragraph("Blah blah blah"));
}
doc.Close();
}
}
}
}
/**
* Step 3 - Search and merge
*/
//We'll search for two different strings just to add some spice
var reg = new Regex("(" + searchText1 + "|" + searchText2 + ")");
//Loop through each file in the directory
foreach (var filePath in Directory.EnumerateFiles(folderPathContainingPdfsToSearch, "*.pdf")) {
using (var pdfReader = new PdfReader(filePath)) {
for (var page = 1; page <= pdfReader.NumberOfPages; page++) {
//Get the text from the page
var currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, new SimpleTextExtractionStrategy());
currentText.IndexOf( "", StringComparison.InvariantCultureIgnoreCase )
//DO NOT DO THIS EVER!! See this for why https://stackoverflow.com/a/10191879/231316
//currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
//Match our pattern against the extracted text
var matches = reg.Matches(currentText);
//Bail early if we can
if (matches.Count == 0) {
continue;
}
//Loop through each match
foreach (var m in matches) {
//This is the file path that we want to target
var destFile = Path.Combine(folderPathContainingPdfsCombined, m.ToString() + ".pdf");
//If the file doesn't already exist then just copy the file and move on
if (!File.Exists(destFile)) {
System.IO.File.Copy(filePath, destFile);
continue;
}
//The file exists so we're going to "append" the page
//However, writing to the end of file in Append mode doesn't work,
//that would be like "add a file to a zip" by concatenating two
//two files. In this case, we're actually creating a brand new file
//that "happens" to contain the original file and the matched file.
//Instead of writing to disk for this new file we're going to keep it
//in memory, delete the original file and write our new file
//back onto the old file
using (var ms = new MemoryStream()) {
//Use a wrapper helper provided by iText
var cc = new PdfConcatenate(ms);
//Open for writing
cc.Open();
//Import the existing file
using (var subReader = new PdfReader(destFile)) {
cc.AddPages(subReader);
}
//Import the matched file
//The OP stated a guarantee of only 1 page so we don't
//have to mess around with specify which page to import.
//Also, PdfConcatenate closes the supplied PdfReader so
//just use the variable pdfReader.
using (var subReader = new PdfReader(filePath)) {
cc.AddPages(subReader);
}
//Close for writing
cc.Close();
//Erase our exisiting file
File.Delete(destFile);
//Write our new file
File.WriteAllBytes(destFile, ms.ToArray());
}
}
}
}
}
I'll write this in pseudo code.
You do something like this:
// loop over different single-page documents
for () {
// introduce a condition
if (condition == met) {
// create single-page PDF
new Document();
new PdfCopy();
document.Open();
copy.add(singlePage);
document.Close();
}
}
This means that you are creating a single-page PDF every time the condition is met. Incidentally, you're overwriting existing files many times.
What you should do, is something like this:
// Create a document with as many pages as times a condition is met
new Document();
new PdfCopy();
document.Open();
// loop over different single-page documents
for () {
// introduce a condition
if (condition == met) {
copy.addPage(singlePage);
}
}
document.Close();
Now you are possibly adding more than one page to the new document you are creating with PdfCopy. Be careful: an exception can be thrown if the condition is never met.
Related
I am new to PDF creation and I am following the existing code to create a pdf file. I am amending the existing code by creating a new pdf.
From the list of the documents, I am looping through each document and give it a new name.
How do I create a new PDF file every time I iterate or loop through the documents list?
resultCollection - Got the list of the documents
currentCompanySegmentsSetings - An object with the details
creatingTableOfContent - table content
What I have tried
foreach (var item in resultCollection)
{
var guidID = Guid.NewGuid().ToString();
var newFileName = $"{currentCompanySegmentsSetings.FriendlySegmentName}-{Translator.TranslateDocumentType("invoice", currentCompanySegmentsSetings).ToLower()}-{guidID}-{message.documents.First().AccountNumber}.pdf";
outputFileNames.Add(newFileName);
//Create PDF's and send to the location
System.IO.Directory.CreateDirectory(currentOutputDirectory);
var firstDocsMetadata = resultCollection.First().MetaData;
string generatedPDFLocation = System.IO.Path.Combine(currentOutputDirectory, newFileName);
var file = DocumentsToPDFDocs(financialDocument);
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(generatedPDFLocation, CreateEncryptionWriteProperties()));
Document doc = new Document(pdfDoc);
//Bookmarks
pdfDoc.GetCatalog().SetPageMode(PdfName.UseOutlines);
doc.SetMargins(22f, 22f, 22f, 22f);
doc.SetFontSize(8);
doc.SetFontColor(Color.BLACK);
//1.Create table of contents
var tableOfContentTopMargin = 176;
Table tableOfContent = creatingTableOfContent(file, currentCompanySegmentsSetings);
tableOfContent.SetDestination("p" + "index");
doc.Add(tableOfContent.SetMarginTop(tableOfContentTopMargin));
//How do i continue from here to create pdf to the directory
message.FinancialDocumentAttachments.Add(new MessageQueueAttachment()
{
Location = documentPath,
IsNew = true,
Id = Guid.NewGuid()
});
}
I assume you want to create a separate PDF file for each item in resultCollection.
You already generate a filename for each file:
var newFileName = ...
string generatedPDFLocation = System.IO.Path.Combine(currentOutputDirectory, newFileName);
And you make a new PdfDocument and Document instance, to be written to a file with that filename:
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(generatedPDFLocation,
CreateEncryptionWriteProperties()));
Document doc = new Document(pdfDoc);
Then just add content to the Document instance (or to the PdfDocument). I assume that content is in the item or resultCollection somehow.
doc.add(new Paragraph("content for this document"));
Finally, close the document which will flush the PDF file to disk.
doc.close();
The next iteration of the foreach loop will generate a different filename and thus the next document will be written to a different file.
You're using:
var firstDocsMetadata = resultCollection.First().MetaData;
When it should be:
var firstDocsMetadata = item.MetaData;
Since you're iterating the resultCollection and item is the element that is obtained in each loop.
I have a word document which contain many pages. One of those pages contain a placeholder instead of other content. so I want to replace that placeholder with another doc file without losing formatting. This doc file which is to be replaced may have many pages. How can I replace that placeholder with this doc file programmatically.. I searched many but could not find any option to insert a doc file replacing a placeholder.. Thank You In Advance.
Or how can we copy the contents of doc to be inserted and then replace the placeholder with copied content
I found a post here.The below code is from that post.
With the library, you can do the following to replace text from a Word document, considering that documentByteArray is your document byte content taken from database:
using (MemoryStream mem = new MemoryStream())
{
mem.Write(documentByteArray, 0, (int)documentByteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
if instead of "Hi Everyone" if we replace it with a binarydata,which is an array of bytes
byte[] binarydata = File.ReadAllBytes(filepaths);
how can we modify the program?
First of all you should get a Nuget package called Novacode.Docx, this is what I have found to be the best Document creator and editor in the last few years.
using Novacode.Docx;
void Main()
{
var doc = DocX.Load(#"c:\temp\existingDoc.docx");
var docToAdd = DocX.Load(#"c:\temp\docToAdd.docx");
doc.InsertDocument(docToAdd, true); //version 1.0.0.22
doc.InsertDocument(docToAdd); //version 1.0.0.19
}
this is the most simple and basic implementation of what it is that youre after but this works.
for anything else take a look at the documentation at
https://docx.codeplex.com/
or
http://cathalscorner.blogspot.co.uk/
this will be the best place to start. I would also recommend that if you do use this one that you use the version 1.0.0.19 as there are some formatting issues in 1.0.0.22
I've recently used iTextSharp to create a PDF by importing the 20 pages from an existing PDF and then adding a dynamically generated link to the bottom of the last page. It works fine... kind of. Viewing the generated PDF in Acrobat Reader on a windows PC displays everything as expected although when closing the document it always asks "Do you want to save changes?". Viewing the generated PDF on a Surface Pro with PDF Reader displays the document without the first and last pages. Apparently on a mobile device using Polaris Office the first and last pages are also missing.
I'm wondering if when the new PDF is generated it's not getting closed off quite properly and that's why it asks "Do you want to save changes?" when closing it. And maybe that's also why it doesn't display correctly in some PDF reader apps.
Here's the code:
using (var reader = new PdfReader(HostingEnvironment.MapPath("~/app/pdf/OriginalDoc.pdf")))
{
using (
var fileStream =
new FileStream(
HostingEnvironment.MapPath("~/documents/attachments/DocWithLink_" + id + ".pdf"),
FileMode.Create, FileAccess.Write))
{
var document = new Document(reader.GetPageSizeWithRotation(1));
var writer = PdfWriter.GetInstance(document, fileStream);
using (PdfStamper stamper = new PdfStamper(reader, fileStream))
{
var baseFont = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, BaseFont.CP1252,
BaseFont.NOT_EMBEDDED);
Font linkFont = FontFactory.GetFont("Arial", 12, Font.UNDERLINE, BaseColor.BLUE);
document.Open();
for (var i = 1; i <= reader.NumberOfPages; i++)
{
document.NewPage();
var importedPage = writer.GetImportedPage(reader, i);
// Copy page of original document to new document.
var contentByte = writer.DirectContent;
contentByte.AddTemplate(importedPage, 0, 0);
if (i == reader.NumberOfPages) // It's the last page so add link.
{
PdfContentByte cb = stamper.GetOverContent(i);
//Create a ColumnText object
var ct = new ColumnText(cb);
//Set the rectangle to write to
ct.SetSimpleColumn(100, 30, 500, 90, 0, PdfContentByte.ALIGN_LEFT);
//Add some text and make it blue so that it looks like a hyperlink
var c = new Chunk("Click here!", linkFont);
var congrats = new Paragraph("Congratulations on reading the eBook! ");
congrats.Alignment = PdfContentByte.ALIGN_LEFT;
c.SetAnchor("http://www.domain.com/pdf/response/" + encryptedId);
//Add the chunk to the ColumnText
congrats.Add(c);
ct.AddElement(congrats);
//Tell the system to process the above commands
ct.Go();
}
}
}
}
}
I've looked at these posts with similar issues but none seem to quite provide the answer I need:
iTextSharp-generated PDFs cause save dialog when closing
Using iTextSharp to write data to PDF works great, but Acrobat Reader asks 'Do you want to save changes' when closing file
(Or they refer to memory streams instead of writing to disk etc)
My question is, how do I modify the above so that when closing the generated PDF in Acrobat Reader there's no "Do you want to save changes?" prompt. The answer to that may solve the problems with missing pages on Surface Pro etc but if you know anything else about what might be causing that I'd like to hear about it.
Any suggestions would be very welcome! Thanks!
At first glance (and without much coffee yet) it appears that you're using a PdfReader in three different contexts, as a source to a PdfStamper, as a source for Document and as for a source for importing. So you are essentially importing a document into itself that you're also writing to.
To give you a quick overview, the following code will essentially clone the contents of source.pdf into dest.pdf:
using (var reader = new PdfReader("source.pdf")){
using (var fileStream = new FileStream("dest.pdf", FileMode.Create, FileAccess.Write)){
using (PdfStamper stamper = new PdfStamper(reader, fileStream)){
}
}
}
Since that does all of the cloning for you you don't need to import pages or anything.
Then, if the only thing that you want to do is add some text to the last page, you can just use the above and ask the PdfStamper for a PdfContentByte using GetOverContent() and telling it what page number you're interested. Then you can just use the rest of your ColumnText logic.
using (var reader = new PdfReader("Source.Pdf")) {
using (var fileStream = new FileStream("Dest.Pdf"), FileMode.Create, FileAccess.Write) {
using (PdfStamper stamper = new PdfStamper(reader, fileStream)) {
//Get a PdfContentByte object
var cb = stamper.GetOverContent(reader.NumberOfPages);
//Create a ColumnText object
var ct = new ColumnText(cb);
//Set the rectangle to write to
ct.SetSimpleColumn(100, 30, 500, 90, 0, PdfContentByte.ALIGN_LEFT);
//Add some text and make it blue so that it looks like a hyperlink
var c = new Chunk("Click here!", linkFont);
var congrats = new Paragraph("Congratulations on reading the eBook! ");
congrats.Alignment = PdfContentByte.ALIGN_LEFT;
c.SetAnchor("http://www.domain.com/pdf/response/" + encryptedId);
//Add the chunk to the ColumnText
congrats.Add(c);
ct.AddElement(congrats);
//Tell the system to process the above commands
ct.Go();
}
}
}
I have generated a word file using Open Xml and I need to send it as attachment in a email with pdf format but I cannot save any physical pdf or word file on disk because I develop my application in cloud environment(CRM online).
I found only way is "Aspose Word to .Net".
http://www.aspose.com/docs/display/wordsnet/How+to++Convert+a+Document+to+a+Byte+Array But it is too expensive.
Then I found a solution is to convert word to html, then convert html to pdf. But there is a picture in my word. And I cannot resolve the issue.
The most accurate conversion from DOCX to PDF is going to be through Word. Your best option for that is setting up a server with OWAS (Office Web Apps Server) and doing your conversion through that.
You'll need to set up a WOPI endpoint on your application server and call:
/wv/WordViewer/request.pdf?WOPISrc={WopiUrl}&type=downloadpdf
OR
/wv/WordViewer/request.pdf?WOPISrc={WopiUrl}&type=printpdf
Alternatively you could try and do it using OneDrive and Word Online, but you'll need to work out the parameters Word Online uses as well as whether that's permitted within the Ts & Cs.
You can try Gnostice XtremeDocumentStudio .NET.
Converting From DOCX To PDF Using XtremeDocumentStudio .NET
http://www.gnostice.com/goto.asp?id=24900&t=convert_docx_to_pdf_using_xdoc.net
In the published article, conversion has been demonstrated to save to a physical file. You can use documentConverter.ConvertToStream method to convert a document to a Stream as shown below in the code snippet.
DocumentConverter documentConverter = new DocumentConverter();
// input can be a FilePath, Stream, list of FilePaths or list of Streams
Object input = "InputDocument.docx";
string outputFileFormat = "pdf";
ConversionMode conversionMode = ConversionMode.ConvertToSeperateFiles;
List<Stream> outputStreams = documentConverter.ConvertToStream(input, outputFileFormat, conversionMode);
Disclaimer: I work for Gnostice.
If you wanna convert bytes array, then to use Metamorphosis:
string docxPath = #"example.docx";
string pdfPath = Path.ChangeExtension(docxPath, ".pdf");
byte[] docx = File.ReadAllBytes(docxPath);
// Convert DOCX to PDF in memory
byte[] pdf = p.DocxToPdfConvertByte(docx);
if (pdf != null)
{
// Save the PDF document to a file for a viewing purpose.
File.WriteAllBytes(pdfPath, pdf);
System.Diagnostics.Process.Start(pdfPath);
}
else
{
System.Console.WriteLine("Conversion failed!");
Console.ReadLine();
}
I have recently used SautinSoft 'Document .Net' library to convert docx to pdf in my React(frontend), .NET core(micro services- backend) application. It only take 15 seconds to generate a pdf having 23 pages. This 15 seconds includes getting data from database, then merging data with docx template and then converting it to pdf. The code has deployed to azure Linux box and works fine.
https://sautinsoft.com/products/document/
Sample code
public string GeneratePDF(PDFDocumentModel document)
{
byte[] output = null;
using (var outputStream = new MemoryStream())
{
// Create single pdf.
DocumentCore singlePDF = new DocumentCore();
var documentCores = new List<DocumentCore>();
foreach (var section in document.Sections)
{
documentCores.Add(GenerateDocument(section));
}
foreach (var dc in documentCores)
{
// Create import session.
ImportSession session = new ImportSession(dc, singlePDF, StyleImportingMode.KeepSourceFormatting);
// Loop through all sections in the source document.
foreach (Section sourceSection in dc.Sections)
{
// Because we are copying a section from one document to another,
// it is required to import the Section into the destination document.
// This adjusts any document-specific references to styles, bookmarks, etc.
// Importing a element creates a copy of the original element, but the copy
// is ready to be inserted into the destination document.
Section importedSection = singlePDF.Import<Section>(sourceSection, true, session);
// First section start from new page.
if (dc.Sections.IndexOf(sourceSection) == 0)
importedSection.PageSetup.SectionStart = SectionStart.NewPage;
// Now the new section can be appended to the destination document.
singlePDF.Sections.Add(importedSection);
//Paging
HeaderFooter footer = new HeaderFooter(singlePDF, HeaderFooterType.FooterDefault);
// Create a new paragraph to insert a page numbering.
// So that, our page numbering looks as: Page N of M.
Paragraph par = new Paragraph(singlePDF);
par.ParagraphFormat.Alignment = HorizontalAlignment.Center;
CharacterFormat cf = new CharacterFormat() { FontName = "Consolas", Size = 11.0 };
par.Content.Start.Insert("Page ", cf.Clone());
// Page numbering is a Field.
Field fPage = new Field(singlePDF, FieldType.Page);
fPage.CharacterFormat = cf.Clone();
par.Content.End.Insert(fPage.Content);
par.Content.End.Insert(" of ", cf.Clone());
Field fPages = new Field(singlePDF, FieldType.NumPages);
fPages.CharacterFormat = cf.Clone();
par.Content.End.Insert(fPages.Content);
footer.Blocks.Add(par);
importedSection.HeadersFooters.Add(footer);
}
}
var pdfOptions = new PdfSaveOptions();
pdfOptions.Compression = false;
pdfOptions.EmbedAllFonts = false;
pdfOptions.EmbeddedImagesFormat = PdfSaveOptions.EmbImagesFormat.Png;
pdfOptions.EmbeddedJpegQuality = 100;
//dont allow editing after population, also ensures content can be printed.
pdfOptions.PreserveFormFields = false;
pdfOptions.PreserveContentControls = false;
if (!string.IsNullOrEmpty(document.PdfProperties.Title))
{
singlePDF.Document.Properties.BuiltIn[BuiltInDocumentProperty.Title] = document.PdfProperties.Title;
}
if (!string.IsNullOrEmpty(document.PdfProperties.Author))
{
singlePDF.Document.Properties.BuiltIn[BuiltInDocumentProperty.Author] = document.PdfProperties.Author;
}
if (!string.IsNullOrEmpty(document.PdfProperties.Subject))
{
singlePDF.Document.Properties.BuiltIn[BuiltInDocumentProperty.Subject] = document.PdfProperties.Subject;
}
singlePDF.Save(outputStream, pdfOptions);
output = outputStream.ToArray();
}
return Convert.ToBase64String(output);
}
I'm trying to create a new pdf file based on another one using PdfCopy.
Everything work fine during generation and the generated file can be opened without any problem on my desktop, but the file seems to be corrupted and isn't accepted by the service that I must use :
SignService error when calling 'sign', probably caused by a bad file format.
I noticed that the generated pdf is always ligther than the original template, so i compared the template version with the generated one. There are some big parts of missing data, especially a whole bunch of xml. I guess PdfCopy does not copying every of my original pdf but i cannot figured out what am i missing.
here is my method :
byte[] completedDocument = null;
string originalUri = Path.Combine(this.PdfPath, pdfName);
string generatedUri = Path.Combine(this.PdfGeneratedPath, generatedPdfName);
using(MemoryStream streamCompleted = new MemoryStream())
{
using(Document doc = new Document())
{
PdfCopy copy = new PdfCopy(doc, streamCompleted);
copy.PdfVersion = PdfWriter.VERSION_1_6;
doc.Open();
copy.Open();
byte[] mergedDocument = null;
PdfReader pdfReader = new PdfReader(originalUri);
int pdfPageNumber = pdfReader.NumberOfPages;
using(MemoryStream streamTemplate = new MemoryStream())
{
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, streamTemplate))
{
AcroFields acrofields = pdfStamper.AcroFields;
foreach (KeyValuePair<string, AcroFields.Item> field in acrofields.Fields)
{
string data;
if (pdfFieldsValues.TryGetValue(field.Key, out data))
{
if (data == null)
{
data = string.Empty;
}
acrofields.SetField(field.Key, data);
}
}
pdfStamper.FormFlattening = true;
pdfStamper.Writer.CloseStream = false;
}
mergedDocument = streamTemplate.ToArray();
}
pdfReader = new PdfReader(mergedDocument);
for (int page = 1; page <= pdfPageNumber; page++)
{
if (!excludedPages.Any(s => s == page))
{
copy.AddPage(copy.GetImportedPage(pdfReader, page));
}
}
doc.Close();
copy.Close();
}
completedDocument = streamCompleted.ToArray();
}
File.WriteAllBytes(generatedUri, completedDocument);
I tried to upload the "mergedDocument" rather than the "completedDocument" and my service accepting it, so i'm pretty sure it has something to do with this part :
for (int page = 1; page <= pdfPageNumber; page++)
{
if (!excludedPages.Any(s => s == page))
{
copy.AddPage(copy.GetImportedPage(pdfReader, page));
}
}
Or pdfCopy init
You start with a form. You fill out the form and you flatten it. By flattening it, you deliberately throw away all interactivity. I'm surprised that you're surprised that the file is getting smaller: you're throwing away the form infrastructure!
You then upload the flattened file to some service unknown to us. This service complains:
SignService error when calling 'sign', probably caused by a bad file format.
As we don't know which service you are talking about, we can only guess. An educated guess would be that the original form contains a signature field that needs to be signed by a signing service.
Obviously that field is gone: you flattened the form! I may be wrong, but I assume that the service also tries to read the fields you filled out, but that won't be possible either as you throw away all interactivity. Please remove the following line:
pdfStamper.FormFlattening = true;
Then there's Chris' comment: it seems that you're using PdfCopy. If you're using an old version of iTextSharp (before iText 5.5.1), you shouldn't expect the form to be preserved. If you are using a recent version, you should instruct PdfCopy to preserve the form (but that line is missing). You don't need to ask 'how do I preserve the form?' because you shouldn't be using PdfCopy anyway.
You only need PdfStamper. You already use PdfStamper to fill out the fields, now you can also use the selectPages() method to select the pages you want to keep (or to exclude the ones you want to remove).
Finally, it is unclear what you mean when you write:
There are some big parts of missing data, especially a whole bunch of xml.
Are you saying that the form isn't a pure AcroForm, but that it also contains an XFA stream? If so, then you most definitely can't use PdfCopy.