ITextsharp: Error reading a pdf file in Byte[] content (PdfReader) - c#

I'm trying to merge several PDFs into a single file through a list that contains their content in byte[]. When opening a document from the Byte[] list with PdfReader, the program launches the following exception: "the document has no pages". When I review the contents of the Byte[] list there are complete, but the exception is always launched.
I try to download the content of that single page separately and the generated document launches error when opening it. The division of the pdf does well because it generates each document in physical and makes it perfect for each page of the PDF.
I appreciate your help or opinions in this situation.
This is the code I use to split and merge documents:
public List<byte[]> SplitPDF(byte[] contentPdf)
{
try
{
var listBythe = new List<byte[]>();
PdfImportedPage page = null;
PdfCopy PdfCopy = null;
PdfReader reader = new PdfReader(contentPdf);
for (int numPage = 1; numPage <= reader.NumberOfPages; numPage++)
{
Document doc = new Document(PageSize.LETTER);
var mStream = new MemoryStream();
PdfCopy = new PdfCopy(doc, mStream);
doc.Open();
page = PdfCopy.GetImportedPage(reader, numPage);
PdfCopy.AddPage(page);
listBythe.Add(mStream.ToArray());
doc.Close();
}
MergePdfToPage(listBythe);
return listBythe;
}
catch (Exception ex)
{
throw ex;
}
}
private byte[] MergePdfToPage(List<byte[]>contentPage)
{
byte[] docPdfByte = null;
var ms = new MemoryStream();
using (Document doc = new Document(PageSize.LETTER))
{
PdfCopy copy = new PdfCopy(doc, ms);
doc.Open();
var num = doc.PageNumber;
foreach (var file in contentPage.ToArray())
{
using (var reader = new PdfReader(file))
{
copy.AddDocument(reader);
}
}
doc.Close();
docPdfByte = ms.ToArray();
}
return docPdfByte;

In your loop you do
Document doc = new Document(PageSize.LETTER);
var mStream = new MemoryStream();
PdfCopy = new PdfCopy(doc, mStream);
doc.Open();
page = PdfCopy.GetImportedPage(reader, numPage);
PdfCopy.AddPage(page);
listBythe.Add(mStream.ToArray());
doc.Close();
In particular you retrieve the mStream bytes before closing doc. But before doc is closed, the pdf is incomplete in mStream!
To get a complete pdf from mStream, please change the order of instructions an do
Document doc = new Document(PageSize.LETTER);
var mStream = new MemoryStream();
PdfCopy = new PdfCopy(doc, mStream);
doc.Open();
page = PdfCopy.GetImportedPage(reader, numPage);
PdfCopy.AddPage(page);
doc.Close();
listBythe.Add(mStream.ToArray());
instead.

I created something for you, hopefully it will work as well as it did for me.
Class :
public class PDFFactory
{
public PDFFactory()
{
PdfDocument = new Document(iTextSharp.text.PageSize.A4, 65, 65, 60, 60);
}
private Document _pdfDocument;
public Document PdfDocument
{
get
{
return _pdfDocument;
}
set
{
_pdfDocument = value;
}
}
private MemoryStream _pdfMemoryStream;
public MemoryStream PDFMemoryStream
{
get
{
return _pdfMemoryStream;
}
set
{
_pdfMemoryStream = value;
}
}
private string _pdfBase64;
public string PDFBase64
{
get
{
if (this.DocumentClosed)
return _pdfBase64;
else
return null;
}
set
{
_pdfBase64 = value;
}
}
private byte[] _pdfBytes;
public byte[] PDFBytes
{
get
{
if (this.DocumentClosed)
return _pdfBytes;
else
return null;
}
set
{
_pdfBytes = value;
}
}
public byte[] GetPDFBytes()
{
PDFDocument.Close();
return PDFMemoryStream.GetBuffer();
}
public void closeDocument()
{
PDFDocument.Close();
PDFBase64 = Convert.ToBase64String(this.PDFMemoryStream.GetBuffer());
PDFBytes = this.PDFMemoryStream.GetBuffer();
}
}
Service:
public byte[] ()
{
PDFFactory pdf_1 = new PDFFactory();
PDFFactory pdf_2 = new PDFFactory();
List<byte[]> sourceFiles = new List<byte[]>();
sourceFiles.Add(pdf_1.GetPDFBytes);
sourceFiles.Add(pdf_2.GetPDFBytes);
PDFFactory pdfFinal = new PDFFactory();
for (int fileCounter = 0; fileCounter <= sourceFiles.Count - 1; fileCounter += 1)
{
PdfReader reader2 = new PdfReader(sourceFiles[fileCounter]);
int numberOfPages = reader2.NumberOfPages;
for (int currentPageIndex = 1; currentPageIndex <= numberOfPages; currentPageIndex++)
{
// Determine page size for the current page
pdfFinal.PDFDocument.SetPageSize(reader2.GetPageSizeWithRotation(currentPageIndex));
// Create page
pdfFinal.PDFDocument.NewPage();
PdfImportedPage importedPage = pdfFinal.PDFWriter.GetImportedPage(reader2, currentPageIndex);
// Determine page orientation
int pageOrientation = reader2.GetPageRotation(currentPageIndex);
if ((pageOrientation == 90) || (pageOrientation == 270))
pdfFinal.PDFWriter.DirectContent.AddTemplate(importedPage, 0, -1.0F, 1.0F, 0, 0, reader2.GetPageSizeWithRotation(currentPageIndex).Height);
else
pdfFinal.PDFWriter.DirectContent.AddTemplate(importedPage, 1.0F, 0, 0, 1.0F, 0, 0);
}
}
pdfFinal.closeDocument();
return pdfFinal.PDFBytes;
}
Let me know if it helped.

Related

Clipping pdf with itextsharp 5 results in a corrupt PDF

I want to read a PDF file using itextsharp 5, and create a new PDF file with the left half of each page of the original PDF, discarding the right half. I wrote the following code, but the result is corrupt and cannot be opened with Acrobat Reader (opening it with Chrome works though). How can I make a PDF that works with Acrobat Reader?
private byte[] ObtenerMitadPdf(byte[] Contenido)
{
using (var pdfReader = new PdfReader(Contenido))
{
using (var memoryStream = new MemoryStream())
{
using (var documento = new Document())
using (var pdfWriter = PdfWriter.GetInstance(documento, memoryStream))
{
pdfWriter.CloseStream = false;
documento.Open();
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
var pagina = pdfWriter.GetImportedPage(pdfReader, i);
var tamañoPagina = pdfReader.GetPageSizeWithRotation(i);
var nuevoTamaño = new Rectangle(tamañoPagina.Left, tamañoPagina.Bottom,
tamañoPagina.Left + (tamañoPagina.Width / 2), tamañoPagina.Top, tamañoPagina.Rotation);
if (tamañoPagina.Rotation == 90)
{
pdfWriter.DirectContent.AddTemplate(pagina, 0, -1, 1, 0, 0, tamañoPagina.Height);
}
else
{
pdfWriter.DirectContent.AddTemplate(pagina, 0, 0);
}
documento.SetPageSize(nuevoTamaño);
documento.NewPage();
}
}
return memoryStream.ToArray();
}
}
}
I finally managed to get it working!
private byte[] ObtenerMitadPdf(byte[] Contenido)
{
using (var pdfReader = new PdfReader(Contenido))
{
using (var memoryStream = new MemoryStream())
{
using (var stamper = new PdfStamper(pdfReader, memoryStream))
{
for (int i = 1; i <= pdfReader.NumberOfPages; i++)
{
var pagina = pdfReader.GetPageN(i);
var tamañoPagina = pdfReader.GetPageSizeWithRotation(i);
var mediaBox = new PdfRectangle(tamañoPagina.Left, tamañoPagina.Bottom,
tamañoPagina.Left + (tamañoPagina.Width / 2), tamañoPagina.Top, tamañoPagina.Rotation);
pagina.Put(PdfName.MEDIABOX, mediaBox);
stamper.MarkUsed(pagina);
}
}
return memoryStream.ToArray();
}
}
}

How to use iTextSharp to save certain pages to a MemoryStream and return selected page as base64 string

Using iTextSharp, I was able to read a base64 string and convert certain pages into local files using a FileStream. I want to do the same without saving to the local filesystem, using MemoryStream and only returning the selected pages to the calling function as a base64 string.
// function takes reader and start and end page and destination file path as parameter.
public static void ExtractPages(PdfReader pdfReader, string sourcePdfPath,
string outputPdfPath, int startPage, int endPage)
{
PdfReader reader = null;
Document sourceDocument = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage = null;
try
{
reader = pdfReader;
sourceDocument = new Document(reader.GetPageSizeWithRotation(startPage));
// old code to save selected pdf into local file system.
FileStream fileStream = new FileStream(outputPdfPath, FileMode.Create);
// memory stream created but not been used yet!
MemoryStream memoryStream = new MemoryStream();
pdfCopyProvider = new PdfCopy(sourceDocument, fileStream);
sourceDocument.Open();
// save selected page into local file system
for (int i = startPage; i <= endPage; i++)
{
importedPage = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(importedPage);
}
#region TestMemoryStream
// TODO: New code to be added and return selected page data as bas64 string
#endregion
sourceDocument.Close();
reader.Close();
}
catch (Exception ex)
{
throw ex;
}
}
I was able to sort out the issue, my modified code is below.
reader = pdfReader;
sourceDocument = new Document(reader.GetPageSizeWithRotation(startPage));
//FileStream fileStream = new FileStream(outputPdfPath, FileMode.Create);
MemoryStream memoryStream = new MemoryStream();
pdfCopyProvider = new PdfCopy(sourceDocument, memoryStream);
sourceDocument.Open();
for (int i = startPage; i <= endPage; i++)
{
importedPage = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(importedPage);
}
sourceDocument.Close();
reader.Close();
byte[] content1 = memoryStream.ToArray();
// var bas64 = Convert.ToBase64String(content1);
string bas64New = Convert.ToBase64String(content1);

Concatenating PDF using PDFsharp returns empty PDF

I have a list of PDF stored as list<byte[]>. I try to concatenate all these PDF files using PDFsharp, but after my operation I get a PDF with proper page count, but all pages are blank. Looks like I lose some header or something but I can't find where.
My code:
PdfDocument output = new PdfDocument();
try
{
foreach (var report in reports)
{
using (MemoryStream stream = new MemoryStream(report))
{
PdfDocument input = PdfReader.Open(stream, PdfDocumentOpenMode.Import);
foreach (PdfPage page in input.Pages)
{
output.AddPage(page);
}
}
}
if (output.Pages.Count <= 0)
{
throw new Exception("Empty Document");
}
MemoryStream final = new MemoryStream();
output.Save(final);
output.Close();
return final.ToArray();
}
catch (Exception e)
{
throw new Exception(e.ToString());
}
I want to return it as byte[] because I use them later:
return File(report, System.Net.Mime.MediaTypeNames.Application.Octet, "test.pdf");
This returns PDF with proper page count, but all blank.
You tell in a comment that the files come from SSRS.
Older versions of PDFsharp require a special SSRS setting:
For the DeviceSettings parameter for the Render method on the ReportExecutionService object, pass this value:
theDeviceSettings = "<DeviceInfo><HumanReadablePDF>True</HumanReadablePDF></DeviceInfo>";
Source:
http://forum.pdfsharp.net/viewtopic.php?p=1613#p1613
I use iTextSharp, look at this saple code (it works)
public static byte[] PdfJoin(List<String> pdfs)
{
byte[] mergedPdf = null;
using (MemoryStream ms = new MemoryStream())
{
using (iTextSharp.text.Document document = new iTextSharp.text.Document())
{
using (iTextSharp.text.pdf.PdfCopy copy = new iTextSharp.text.pdf.PdfCopy(document, ms))
{
document.Open();
for (int i = 0; i < pdfs.Count; ++i)
{
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(pdfs[i]);
// loop over the pages in that document
int n = reader.NumberOfPages;
for (int page = 0; page < n; )
{
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
}
}
}
mergedPdf = ms.ToArray();
}
return mergedPdf;
}
public static byte[] PdfJoin(List<byte[]> pdfs)
{
byte[] mergedPdf = null;
using (MemoryStream ms = new MemoryStream())
{
using (iTextSharp.text.Document document = new iTextSharp.text.Document())
{
using (iTextSharp.text.pdf.PdfCopy copy = new iTextSharp.text.pdf.PdfCopy(document, ms))
{
document.Open();
for (int i = 0; i < pdfs.Count; ++i)
{
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(pdfs[i]);
// loop over the pages in that document
int n = reader.NumberOfPages;
for (int page = 0; page < n; )
{
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
}
}
}
mergedPdf = ms.ToArray();
}
return mergedPdf;
}

Extract pages in memory iTextSharp

can someone help me with this problem. How can I extract some pages from pdf and return them as byte array or Stream, without using physical file as output.
Here is the way doing this using filestream:
public static void ExtractPages(string sourcePdfPath, string outputPdfPath, int startPage, int endPage)
{
PdfReader reader = null;
Document sourceDocument = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage = null;
try
{
reader = new PdfReader(sourcePdfPath);
sourceDocument = new Document(reader.GetPageSizeWithRotation(startPage));
pdfCopyProvider = new PdfCopy(sourceDocument, new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
sourceDocument.Open();
for(int i = startPage; i <= endPage; i++)
{
importedPage = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(importedPage);
}
sourceDocument.Close();
reader.Close();
}
catch(Exception ex)
{
throw ex;
}
}
I need something like:
public static byte[] ExtractPages(string sourcePdfPath, int startPage, int endPage)
{
....
return byte[];
}
Replace the new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create) with a MemoryStream and return it.
Some thing like (untested, but should work):
public static byte[] ExtractPages(string sourcePdfPath, int startPage, int endPage)
{
PdfReader reader = null;
Document sourceDocument = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage = null;
MemoryStream target = new MemoryStream();
reader = new PdfReader(sourcePdfPath);
sourceDocument = new Document(reader.GetPageSizeWithRotation(startPage));
pdfCopyProvider = new PdfCopy(sourceDocument, target);
sourceDocument.Open();
for(int i = startPage; i <= endPage; i++)
{
importedPage = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(importedPage);
}
sourceDocument.Close();
reader.Close();
return target.ToArray();
}

iTextSharp Efficient Batch Pdf Generation From a 1 Page Template

I am generating a multiple page PDF using ITextSharp, each page having the sames template.
The problem is that the PDF grows in Phisical Size by the size of the template.
I HAVE to use ACROFIELDS.
How can I reduce the final file size ?
Here is a code snippet of the pdf handler:
public void ProcessRequest(HttpContext context)
{
Context = context;
Response = context.Response;
Request = context.Request;
try
{
LoadDataInternal();
}
catch (System.Threading.ThreadAbortException)
{
// no-op
}
catch (Exception ex)
{
Logger.LogError(ex);
Response.Write("Error");
Response.End();
}
Response.BufferOutput = true;
Response.ClearHeaders();
Response.ContentType = "application/pdf";
if (true)
{
Response.AddHeader("Content-Disposition", "attachment; filename=" +
(string.IsNullOrEmpty(DownloadFileName) ? context.Session.SessionID + ".pdf" : DownloadFileName));
}
PdfCopyFields copy
= new PdfCopyFields(Response.OutputStream);
// add a document
for (int i = 0; i < dataset.Tables["Model"].Rows.Count; i++)
{
copy.AddDocument(new PdfReader(renameFieldsIn(TemplatePath, i)));
// add a document
}
copy.SetFullCompression();
// Close the PdfCopyFields object
copy.Close();
}
private byte[] renameFieldsIn(String datasheet, int i)
{
MemoryStream baos = new MemoryStream();
// Create the stamper
PdfStamper stamper = new PdfStamper(new PdfReader(GetTemplateBytes()), baos);
// Get the fields
AcroFields form = stamper.AcroFields;
// Loop over the fields
List<String> keys = new List<String>(form.Fields.Keys);
foreach (var key in keys)
{
// rename the fields
form.RenameField(key, String.Format("{0}_{1}", key, i));
}
stamper.FormFlattening = true;
stamper.FreeTextFlattening = true;
stamper.SetFullCompression();
SetFieldsInternal(form, i);
// close the stamper
stamper.Close();
return baos.ToArray();
}
protected byte[] GetTemplateBytes()
{
var data = Context.Cache[PdfTemplateCacheKey] as byte[];
if (data == null)
{
data = File.ReadAllBytes(Context.Server.MapPath(TemplatePath));
Context.Cache.Insert(PdfTemplateCacheKey, data,
null, DateTime.Now.Add(PdfTemplateCacheDuration), Cache.NoSlidingExpiration);
}
return data;
}
I have done something a little similar before and found that after combining all the pages, running the entire generated document through the PDFStamper again resulted in a noticeable compression of the file size.
Ok, so a friend of mine came up with a solution. The only problem remaining is the amount of memory it takes to create a new PdfStamper. So here's the rewritten procedure.
Anyway. Hope this hepls anyone else. And if you have a better solution please don't be shy.
public void ProcessRequest (HttpContext context)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
Context = context;
Response = context.Response;
Request = context.Request;
Response.BufferOutput = true;
Response.ClearHeaders();
Response.ContentType = "application/pdf";
Response.AddHeader("Content-Disposition", "attachment; filename=itextsharp_multiple.pdf");
var pageBytes = (byte[])null;
var pagesAll = new List<byte[]>();
try
{
for (int i = 0; i < GlobalHttpApplication.Model.TestData.Rows.Count; i++)
{
PdfStamper pst = null;
MemoryStream mstr = null;
using (mstr = new MemoryStream())
{
try
{
PdfReader reader = new PdfReader(GetTemplateBytes());
pst = new PdfStamper(reader, mstr);
var acroFields = pst.AcroFields;
SetFieldsInternal(acroFields, 0);
pst.FormFlattening = true;
pst.SetFullCompression();
} finally
{
if (pst != null)
pst.Close();
}
}
pageBytes = mstr.ToArray();
pagesAll.Add(pageBytes);
}
} finally
{
}
Document doc = new Document(PageSize.A4);
var writer = PdfWriter.GetInstance(doc, Response.OutputStream);
var copy2 = new PdfSmartCopy(doc, Response.OutputStream);
doc.Open();
doc.OpenDocument();
foreach (var b in pagesAll)
{
doc.NewPage();
copy2.AddPage(copy2.GetImportedPage(new PdfReader(b), 1));
}
copy2.Close();
watch.Stop();
File.AppendAllText(context.Server.MapPath("~/App_Data/log.txt"), watch.Elapsed + Environment.NewLine);
}

Categories