File.WriteAllBytes generates one pdf fine, but multiple pdfs with errors? - c#

I have a loop that goes through some data and generates pdf files. if I generate one pdf by iteself, it works just fine (the pdf opens), but if I create 2 pdf files, the first one will open fine, but the second one will display and error saying the file is corrupt or something similar. Is there something I am doing wrong in the loop with the stream, etc?
foreach (report r reports)
{
byte[] pdf;
ReportName = r.ReportName;
switch (r.ReportId.ToLower())
{
case "pdf":
pdfBuilder = new pdfHelper(candidate,
pdfTemplates[(Guid)case_report.TemplateId], r.XMLFieldData, DCFormats,
r.ProjectReportName, dependants, DepCount, SpoCount);
pdf = pdfBuilder.GenerateCasePDF();
break;
}
//Add Bookmarks for each report in candidate
ChapterCount++;
ChapterReport = new Chapter(new Paragraph(case_report.ReportName), ChapterCount);
tDoc.Add(ChapterReport);
reader = new PdfReader(pdf);
n = reader.NumberOfPages;
for (int page = 1; page <= n; page++)
copy.AddPage(copy.GetImportedPage(reader, page));
copy.FreeReader(reader);
reader.Close();
}
//Save pdf to folder
ReportName = null;
tDoc.Close();
PubResult = outputStream.ToArray();
File.WriteAllBytes(string.Format(#"{0}\{1}.pdf", JobRootPath, CaseFileName), PubResult);
//Reset for next case
outputStream = new MemoryStream();
tDoc = new iTextSharp.text.Document();
copy = new PdfSmartCopy(tDoc, outputStream);
copy.ViewerPreferences = PdfWriter.PageModeUseOutlines;
copy.SetFullCompression();
tDoc.Open();
}

I'm guessing ChapterCount should get reset to its initial value like all the other variables you're resetting at the end of your for loop.
That aside, I'd recommend moving the body of your for loop, and all related variables, into a new method. Reusing variables tends to lead to errors like this.

Related

Write to a PDF file with fields multiple times using iTextSharp

I have a PDF document with 3 Fields txt_FirstName, txt_MiddleName and txt_LastName that I write into using iTextSharp.
I have a loop that creates the output file, writes to it, and closes the file.
The first time in the loop the file writes the first name and the middle name.
The second time in the loop the file should have the first name, middle name, and write the last name.
Issue: The problem is, when it goes to the loop the 2nd time around and writes the lastname the first name, and middle names disappear.
Goal: The main thing I want to do is write to the same PDF documents multiple times
Download PDF template: https://www.scribd.com/document/412586469/Testing-Doc
public static string templatePath = "C:\\temp\\template.pdf";
public static string OutputPath = "C:\\Output\\";
private static void Fill_PDF()
{
string outputFile = "output.pdf";
int counter = 1;
for (int i = 0; i < 2; i++)
{
PdfStamper pdfStamper;
PdfReader reader;
reader = new PdfReader(File.ReadAllBytes(templatePath));
PdfReader.unethicalreading = true;
if (File.Exists(OutputPath + outputFile))
{
pdfStamper = new PdfStamper(reader, new FileStream(OutputPath + outputFile,
FileMode.Append, FileAccess.Write));
}
else
{
pdfStamper = new PdfStamper(reader, new FileStream(OutputPath + outputFile,
FileMode.Create));
}
AcroFields pdfFormFields = pdfStamper.AcroFields;
if (counter == 1)
{
pdfFormFields.SetField("txt_FirstName", "Scooby");
pdfFormFields.SetField("txt_MiddleName", "Dooby");
counter++;
}
else if (counter == 2)
{
pdfFormFields.SetField("txt_LastName", "Doo");
}
pdfStamper.Close();
}
}
This seems like a straightforward bug. The first time through the loop, you load up the blank template and write the first and middle name. The second time through the loop, you load up the blank template again and write only the last name to it, then save to the same filename, overwriting it. If, during the second time through the loop, you want to load the file that already contains the first and middle name, you have to load up the output file you wrote the first time around, not the blank template again. Or if you want to load the blank template again, inside your if (counter == 2) clause, you're going to have to write all 3 names, not just the last name.
I reproduced your bug, and got it working. Here's the code to the first solution I described (minor modification of your code):
public static string templatePath = "C:\\temp\\template.pdf";
public static string OutputPath = "C:\\temp\\output\\";
private static void Fill_PDF()
{
string outputFile = "output.pdf";
int counter = 1;
for (int i = 0; i < 2; i++)
{
PdfStamper pdfStamper;
PdfReader reader = null;
/********** here's the changed part */
if (counter == 1)
{
reader = new PdfReader(File.ReadAllBytes(templatePath));
} else if (counter == 2)
{
reader = new PdfReader(File.ReadAllBytes(OutputPath + outputFile));
}
/************ end changed part */
PdfReader.unethicalreading = true;
if (File.Exists(OutputPath + outputFile))
{
pdfStamper = new PdfStamper(reader, new FileStream(OutputPath + outputFile,
FileMode.Append, FileAccess.Write));
}
else
{
pdfStamper = new PdfStamper(reader, new FileStream(OutputPath + outputFile,
FileMode.Create));
}
AcroFields pdfFormFields = pdfStamper.AcroFields;
if (counter == 1)
{
pdfFormFields.SetField("txt_FirstName", "Scooby");
pdfFormFields.SetField("txt_MiddleName", "Dooby");
counter++;
}
else if (counter == 2)
{
pdfFormFields.SetField("txt_LastName", "Doo");
}
pdfStamper.Close();
}
}
There are two major issues with the code. #Nick in his answer already pointed out the first: If in your second pass you want to edit a version of your document containing the changes from the first pass, you have to take the output document of the first pass as input of the second pass, not again the original template. He also presented code that fixed this issue.
The second issue is located here:
if (File.Exists(OutputPath + outputFile))
{
pdfStamper = new PdfStamper(reader, new FileStream(OutputPath + outputFile,
FileMode.Append, FileAccess.Write));
}
else
{
pdfStamper = new PdfStamper(reader, new FileStream(OutputPath + outputFile,
FileMode.Create));
}
If the output file already exists, you append the output of your PdfStamper to it. This is wrong! The output of the PdfStamper already contains the contents of the original PDF (from the PdfReader) as far as they have not being changed. Thus, your code effectively produces a concatenation of the complete output PDF of the first pass and the complete output PDF of the second pass.
PDF is a binary format for which concatenating files like that does not result in a valid PDF file. Thus, a PDF viewer loading your final result tries to repair this double PDF assuming it is a single one. The result may or may not look like you want.
To fix the second issue, simply replace the if{...}else{...} above by the contents of the else branch only:
pdfStamper = new PdfStamper(reader, new FileStream(OutputPath + outputFile,
FileMode.Create));
(FileMode.Create is defined as
Specifies that the operating system should create a new file. If the file already exists, it will be overwritten. This requires Write permission. FileMode.Create is equivalent to requesting that if the file does not exist, use CreateNew; otherwise, use Truncate. If the file already exists but is a hidden file, an UnauthorizedAccessException exception is thrown.
Thus, it will also do the required if there already is a file.)
You can recognize the problems of the code with the Append in it by running it a few times and watch the output file grow and grow beyond need. Furthermore, if you open that file in Adobe Reader and close again, Adobe Reader offers to save the changes; the changes are the repair work.
You may have heard about incremental updates of PDFs where changes are appended to the original PDF. But this is different from a mere concatenation, the revisions in the result are linked specially and the offsets are always calculated from the start of the first revision, not from the start of the current revision. Furthermore, incremental updates should only contain changed objects.
iText contains a PdfStamper constructor with 4 parameters, including a final boolean parameter append. Using that constructor and setting append to true makes iText creates incremental updates. But even here you don't use FileMode.Append...
The problem is with using the template file again for the second iteration.
First iteration: works fine as expected!
Second iteration: you are reading the same file and writing only the last name. Finally, the output file created in the first iteration is being replaced.
Fix: After knowing if the output file exists in the location, choose the file source to read like below. This should fix the problem. Checked it personally and it worked!
if (File.Exists(OutputPath + outputFile))
{
reader = new PdfReader(File.ReadAllBytes(OutputPath + outputFile));
pdfStamper = new PdfStamper(reader, new FileStream(OutputPath + outputFile,
FileMode.Append, FileAccess.Write));
}
else
{
reader = new PdfReader(File.ReadAllBytes(templatePath));
pdfStamper = new PdfStamper(reader, new FileStream(OutputPath + outputFile,
FileMode.Create));
}

Getting number of pages from a pdf read as a memory stream in iTextSharp5 rather then file system?

Using iTextSharp 5 I am trying to get the number of pages of a PDF file that I am pulling thru a memory stream.
using (var inms = new MemoryStream(file.Image))//file.Image is a byte array
{
var reader = new PdfReader(inms);
var pageCount = reader.NumberOfPages;
}
When I do this pageCount always comes out as 1 even though there are 18 pages in the document.
using (var pdfReader = new PdfReader(filePath))
{
var pageCount = pdfReader.NumberOfPages;
}
When I use the second method and read the document as a file from the file system it returns the expected 18 pages.
Any ideas on why this is and how to get around it?

Merge PDF based on size using Aspose

As I am new to Aspose, I need help in below case.
I want to merge multiple PDF into 1 PDF using Aspose, I can do it easily but the problem is, I want to limit the PDF size to 200MB.
That means, If my merged PDF size is greater than 200MB, then I need to split the PDF into multiple PDF. For Example, If my merged PDF is of 300MB, then first PDF should be of 200MB and second one PDF should be 100MB.
Main problem is, I am not able to find the size of the document in below code. I am using below code.
Document destinationPdfDocument = new Document();
Document sourcePdfDocument = new Document();
//Merge PDF one by one
for (int i = 0; i < filesFromDirectory.Count(); i++)
{
if (i == 0)
{
destinationPdfDocument = new Document(filesFromDirectory[i].FullName);
}
else
{
// Open second document
sourcePdfDocument = new Document(filesFromDirectory[i].FullName);
// Add pages of second document to the first
destinationPdfDocument.Pages.Add(sourcePdfDocument.Pages);
//** I need to check size of destinationPdfDocument over here to limit the size of resultant PDF**
}
}
// Encrypt PDF
destinationPdfDocument.Encrypt("userP", "ownerP", 0, CryptoAlgorithm.AESx128);
string finalPdfPath = Path.Combine(destinationSourceDirectory, destinatedPdfPath);
// Save concatenated output file
destinationPdfDocument.Save(finalPdfPath);
Other way of merging PDF based on size also be appreciated.
Thanks in Advance
I am afraid that there is no direct way to determine PDF file size before saving it physically. Therefore, we have already logged a feature request as PDFNET-43073 in our issue tracking system and product team has been investigating the feasibility of this feature. As soon as we have some significant updates regarding availability of the feature, we will definitely inform you. Please spare us little time.
However, as a workaround, you may save document into a memory stream and place a check on the size of that memory stream, whether it exceeds from your desired PDF size or not. Please check following code snippet, where we have generated PDFs with desired size of 200MBs with aforementioned approach.
//Instantiate document objects
Document destinationPdfDocument = new Document();
Document sourcePdfDocument = new Document();
//Load source files which are to be merged
var filesFromDirectory = Directory.GetFiles(dataDir, "*.pdf");
for (int i = 0; i < filesFromDirectory.Count(); i++)
{
if (i == 0)
{
destinationPdfDocument = new Document(filesFromDirectory[i]);
}
else
{
// Open second document
sourcePdfDocument = new Document(filesFromDirectory[i]);
// Add pages of second document to the first
destinationPdfDocument.Pages.Add(sourcePdfDocument.Pages);
//** I need to check size of destinationPdfDocument over here to limit the size of resultant PDF**
MemoryStream ms = new MemoryStream();
destinationPdfDocument.Save(ms);
long filesize = ms.Length;
ms.Flush();
// Compare the filesize in MBs
if (i == filesFromDirectory.Count() - 1)
{
destinationPdfDocument.Save(dataDir + "PDFOutput_" + i + ".pdf");
}
else if ((filesize / (1024 * 1024)) < 200)
continue;
else
{
destinationPdfDocument.Save(dataDir + "PDFOutput_" + i.ToString() + ".pdf");
destinationPdfDocument = new Document();
}
}
}
I hope this will be helpful. Please let us know if you need any further assistance.
I work with Aspose as Developer Evangelist.

iTextSharp PdfTextExtractor GetTextFromPage Throwing NullReferenceException

I am using iTextSharp for reading PDF documents but lately it seems that i'm getting a
{"Object reference not set to an instance of an object."}
or NullReferenceException upon getting the text from the page of PdfReader. Before it is working but after this day, it is not already working. I didn't change my code.
Below is my code:
for (int i = 1; i <= reader.NumberOfPages; i++)
{
ITextExtractionStrategy its = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(reader, i, its);
if (currentText.Contains("ADVANCES"))
{
return i;
}
}
return 0;
The above code throws a null reference exception, reader is not null and i is obviously not null being an int.
I am instantiating the PDFreader from the input stream
PdfReader reader = new PdfReader(_stream)
Below is the stack trace:
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.DisplayXObject(PdfName xobjectName)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.InvokeOperator(PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent(Byte[] contentBytes, PdfDictionary resources)
at iTextSharp.text.pdf.parser.PdfReaderContentParser.ProcessContent[E](Int32 pageNumber, E renderListener)
at iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(PdfReader reader, Int32 pageNumber, ITextExtractionStrategy strategy)
To be simple, i tried to create a simple console application that will just read all the text from the PDF file and display it. Below is the code. Result is the same as above, it gives NullReferenceException.
class Program
{
static void Main(string[] args)
{
Console.WriteLine(ExtractTextFromPdf(#"stockQuotes_03232015.pdf"));
}
public static string ExtractTextFromPdf(string path)
{
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
}
return text.ToString();
}
}
}
Does anyone know what might be going on here or how i might work around it?
To summarize what has been found out in the comments to the question...
In short
The PDF the OP at first used is invalid: It misses required objects which are of interest to the parser.
Since he finally got hold on a valid version, he now is able to parse successfully.
In detail
Depending on the time and mode of request, the web site the PDFs in question were requested from returned different versions of the same document, sometimes complete, sometimes in an invalid manner incomplete.
The test file was stockQuotes_03232015.pdf, i.e. the PDF containing the data generated on the test day:
A complete copy
An incomplete, invalid copy
The complete file could already be recognized by size, in my downloads it is 250933 bytes long while my incomplete file is 81062 bytes long.
Inspecting the files it looks like the incomplete file has been derived from the complete one by some tool which removed duplicate image streams but forgot to change the references to the removed streams by references to the retained stream object.
Please us below codes to read text from PDF. It shows text from PDF in a RichTextBox namely - richTextBox1.
Reference Youtube: https://www.youtube.com/watch?v=22C9N4WP4-s
using (OpenFileDialog ofd = new OpenFileDialog() { Filter = "PDF files|*.pdf", ValidateNames = true, Multiselect = false })
{
if(ofd.ShowDialog() == DialogResult.OK)
{
try
{
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(ofd.FileName);
StringBuilder sb = new StringBuilder();
for(int i = 1; i<reader.NumberOfPages; i++)
{
sb.Append(iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader,i));
}
richTextBox1.Text = sb.ToString();
reader.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message, "Message", MessageBoxButtons.OK, MessageBoxIcon.Error);
}
}
}

Remove outer print marks on PDF iTextSharp

I have a pdf file with a cover that looks like the following:
Now, I need to remove the so-called 'galley marks' around the edges of the cover. I am using iTextSharp with C# and I need code using iTextSharp to create a new document with only the intended cover or use PdfStamper to remove that. Or any other solution using iTextSharp that would deliver the results.
I have been unable to find any good code samples in my search to this point.
Do you have to actually remove them or can you just crop them out? If you can just crop them out then the code below will work. If you have to actually remove them from the file then to the best of my knowledge there isn't a simple way to do that. Those objects aren't explicitly marked as meta-objects to the best of my knowledge. The only way I can think of to remove them would be to inspect everything and see if it fits into the document's active area.
Below is sample code that reads each page in the input file and finds the various boxes that might exist, trim, art and bleed. (See this page.)
As long as it finds at least one it sets the page's crop box to the first item in the list. In your case you might actually have to perform some logic to find the "smallest" of all of those items or you might be able to just know that "art" will always work for you. See the code for additional comments. This targets iTextSharp 5.4.0.0.
//Sample input file
var inputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Binder1.pdf");
//Sample output file
var outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Cropped.pdf");
//Bind a reader to our input file
using (var r = new PdfReader(inputFile)) {
//Get the number of pages
var pageCount = r.NumberOfPages;
//See this for a list: http://api.itextpdf.com/itext/com/itextpdf/text/pdf/PdfReader.html#getBoxSize(int, java.lang.String)
var boxNames = new string[] { "trim", "art", "bleed" };
//We'll create a list of all possible boxes to pick from later
List<iTextSharp.text.Rectangle> boxes;
//Loop through each page
for (var i = 1; i <= pageCount; i++) {
//Initialize our list for this page
boxes = new List<iTextSharp.text.Rectangle>();
//Loop through the list of known boxes
for (var j = 0; j < boxNames.Length; j++) {
//If the box exists
if(r.GetBoxSize(i, boxNames[j]) != null){
//Add it to our collection
boxes.Add(r.GetBoxSize(i, boxNames[j]));
}
}
//If we found at least one box
if (boxes.Count > 0) {
//Get the page's entire dictionary
var dict = r.GetPageN(i);
//At this point we might want to apply some logic to find the "inner most" box if our trim/bleed/art aren't all the same
//I'm just hard-coding the first item in the list for demonstration purposes
//Set the page's crop box to the specified box
dict.Put(PdfName.CROPBOX, new PdfRectangle(boxes[0]));
}
}
//Create our output file
using (var fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
//Bind a stamper to our reader and output file
using(var stamper = new PdfStamper(r,fs)){
//We did all of our PDF manipulation above so we don't actually have to do anything here
}
}
}

Categories