iTextSharp generated PDF works in browser but not after download - c#

I have a PDF which is generated by iTextSharp in C# - it's a template PDF, which gets some additional lines of text added using stamper, then pushed to S3 and finally returned to the browser as a file stream (using mvc.net).
The newly added lines work fine when the PDF is viewed in the browser (Chrome), but when I download the PDF and open it locally (with Preview or Adobe Acrobat on Mac), only the template is showing, and the newly added lines are gone.
What could cause this?
Here's a code example: (condensed)
using(var receiptTemplateStream = GetType().Assembly.GetManifestResourceStream("XXXXX.DepositReceipts.Receipt.pdf" ))
{
var reader = new PdfReader(receiptTemplateStream);
var outputPdfStream = new MemoryStream();
var stamper = new PdfStamper(reader, outputPdfStream) { FormFlattening = true, FreeTextFlattening = true };
var _pbover = stamper.GetOverContent(1);
using (var latoLightStream = GetType().Assembly.GetManifestResourceStream("XXXXX.DepositReceipts.Fonts.Lato-Light.ttf"))
using (var latoLightMS = new MemoryStream())
{
_pbover.SetFontAndSize(latoLight, 11.0f);
var verticalPosition = 650;
_pbover.ShowTextAligned(0, account.company_name, 45, verticalPosition, 0);
verticalPosition = verticalPosition - 15;
var filename = "Receipt 0001.pdf";
stamper.SetFullCompression();
stamper.Close();
var file = outputPdfStream.ToArray();
using (var output = new MemoryStream())
{
output.Write(file, 0, file.Length);
output.Position = 0;
var response = await _s3Client.PutObjectAsync(new PutObjectRequest()
{
InputStream = output,
BucketName = "XXXX",
CannedACL = S3CannedACL.Private,
Key = filename
});
}
return filename;
}
}

This was a funky one!
I had another method in the same solution that worked without problems. It turns out that the template PDF, that I load and write my content over, was the issue.
The template I used was generated in Adobe Illustrator. I had another one which was generated in Adobe Indesign - which worked.
When I pulled this template pdf into Indesign, and then exported it again (from Indesign) it suddenly worked.
I'm not sure exactly what caused this issue, but it must be some sort of encoding.

Related

C# iText7 - 'Trailer Not Found' when using PdfReader with PDF string from database

I'm saving the contents of my PDF file (pdfAsString) to the database.
File is of type IFormFile (file uploaded by the user).
string pdfAsString;
using (var reader = new StreamReader(indexModel.UploadModel.File.OpenReadStream()))
{
pdfAsString = await reader.ReadToEndAsync();
// pdfAsString = ; // encoding function or lack thereof
}
Later I'm trying to fetch and use these contents to initialize a new instance of MemoryStream, then using that to create a PdfReader and then using that to create a PdfDocument, but at this point I get the 'Trailer not found' exception. I have verified that the Trailer part of the PDF is present inside the contents of the string that I use to create the MemoryStream. I have also made sure the position is set to the beginning of the file.
The issue seems related to the format of the PDF contents fetched from the database. iText7 doesn't seem able to navigate through it other than the beginning of the file.
I'm expecting to be able to create an instance of PdfDocument with the contents of the PDF saved to my database.
Note 1: Using the Stream created from OpenReadStream() works when trying to create a PdfReader and then PdfDocument, but I don't have access to that IFormFile when reading from the DB, so this doesn't help me in my use case.
Note 2: If I use the PDF from my device by giving a path, it works correctly, same for using a FileStream created from a path. However, this doesn't help my use case.
So far, I've tried saving it raw and then using that right out of the gate (1) or encoding special symbols like \n \t to ASCII hexadecimal notation (2). I've also tried HttpUtility.UrlEncode on save and UrlDecode after getting the database record (3), and also tried ToBase64String on save and FromBase64String on get (4).
// var pdfContent = databaseString; // 1
// var pdfContent = databaseString.EncodeSpecialCharacters(); // encode special symbols // 2
// var pdfContent = HttpUtility.UrlDecode(databaseString); // decode urlencoded string // 3
var pdfContent = Convert.FromBase64String(databaseString); // decode base64 // 4
using (var stream = new MemoryStream(pdfContent))
{
PdfReader pdfReader = new PdfReader(stream).SetUnethicalReading(true);
PdfWriter pdfWriter = new PdfWriter("new-file.pdf");
PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter); // exception here :(
// some business logic...
}
Any help would be appreciated.
EDIT: on a separate project, I'm trying to run this code:
using (var stream = File.OpenRead("C:\\<path>\\<filename>.pdf"))
{
var formFile = new FormFile(stream, 0, stream.Length, null, "<filename>.pdf");
var reader = new StreamReader(formFile.OpenReadStream());
var pdfAsString = reader.ReadToEnd();
var pdfAsBytes = Encoding.UTF8.GetBytes(pdfAsString);
using (var newStream = new MemoryStream(pdfAsBytes))
{
newStream.Seek(0, SeekOrigin.Begin);
var pdfReader = new PdfReader(newStream).SetUnethicalReading(true);
var pdfWriter = new PdfWriter("Test-PDF-1.pdf");
PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter);
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<string, PdfFormField> fields = form.GetFormFields();
foreach (var field in fields)
{
field.Value.SetValue(field.Key);
}
//form.FlattenFields();
pdf.Close();
}
}
and if I replace "newStream" inside of PdfReader with formFile.OpenReadStream() it works fine, otherwise I get the 'Trailer not found' exception.
Answer: use BinaryReader and ReadBytes instead of StreamReader when initially trying to read the data. Example below:
using (var stream = File.OpenRead("C:\\<filepath>\\<filename>.pdf"))
{
// FormFile - my starting point inside of the web application
var formFile = new FormFile(stream, 0, stream.Length, null, "<filename>.pdf");
var reader = new BinaryReader(formFile.OpenReadStream());
var pdfAsBytes = reader.ReadBytes((int)formFile.Length); // store this in the database
using (var newStream = new MemoryStream(pdfAsBytes))
{
newStream.Seek(0, SeekOrigin.Begin);
var pdfReader = new PdfReader(newStream).SetUnethicalReading(true);
var pdfWriter = new PdfWriter("Test-PDF-1.pdf");
PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter);
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<string, PdfFormField> fields = form.GetFormFields();
foreach (var field in fields)
{
field.Value.SetValue(field.Key);
}
//form.FlattenFields();
pdf.Close();
}
}

Adding a HTML page as a last page to PDF document

I am creating a PDF Document consisting 6 images (1 Image on 1 Page) using iTextSharp.
I need to add a HTML Page as a last page after the 6th Image.
I have tried the below, but the HTML does not get added on a new page, instead gets attached immediately below the 5th Image.
Please advice how to make the html add to the last page.
Code for reference:
string ImagePath = HttpContext.Current.Server.MapPath("~/Images/");
string[] fileNames = System.IO.Directory.GetFiles(ImagePath);
string outputFileNames = "Test.pdf";
string outputFilePath = System.Web.Hosting.HostingEnvironment.MapPath("~/Pdf/" + outputFileNames);
Document doc = new Document(PageSize.A4, 20, 20, 20, 20);
System.IO.Stream st = new FileStream(outputFilePath, FileMode.Create, FileAccess.Write);
PdfWriter writer = PdfWriter.GetInstance(doc, st);
doc.Open();
writer.PageEvent = new Footer();
for (int i = 0; i < fileNames.Length; i++)
{
string fname = fileNames[i];
if (System.IO.File.Exists(fname) && Path.GetExtension(fname) == ".png")
{
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance(fname);
img.Border = iTextSharp.text.Rectangle.BOX;
img.BorderColor = iTextSharp.text.BaseColor.BLACK;
doc.Add(img);
}
}
byte[] pdf; // result will be here
var cssText = File.ReadAllText(MapPath("~/Style1.css"));
var html = File.ReadAllText(MapPath("~/HtmlPage1.html"));
using ( var memoryStream = new MemoryStream())
{
using (var cssMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssText)))
{
using (var htmlMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, htmlMemoryStream, cssMemoryStream);
}
}
pdf = memoryStream.ToArray();
//document.Add(new Paragraph(Encoding.UTF8.GetString(pdf)));
}
doc.NewPage();
doc.Add(new Paragraph(Encoding.UTF8.GetString(pdf)));
doc.Close();
writer.Close();
I need to add a HTML Page as a last page after the 6th Image.
Any help is appreciated
In contrast to what you assume according to your code comments, pdf is not where the result will be. It remains empty:
byte[] pdf; // result will be here
...
using ( var memoryStream = new MemoryStream())
{
... code not accessing memoryStream ...
pdf = memoryStream.ToArray();
//document.Add(new Paragraph(Encoding.UTF8.GetString(pdf)));
}
doc.NewPage();
doc.Add(new Paragraph(Encoding.UTF8.GetString(pdf)));
Thus, you add the new page before adding an empty paragraph, after the converted html already has been added to the document.
Actually it is added during
XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, htmlMemoryStream, cssMemoryStream);
So you have to add the new page before that. Thus, the following replacing everything from your byte[] pdf; on should do the job:
var cssText = File.ReadAllText(MapPath("~/Style1.css"));
var html = File.ReadAllText(MapPath("~/HtmlPage1.html"));
using (var cssMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssText)))
{
using (var htmlMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
{
doc.NewPage();
XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, htmlMemoryStream, cssMemoryStream);
}
}
doc.Close();
As an aside, don't close the writer! It implicitly is closed when the doc is closed. Closing it again does nothing at best or damage otherwise.
In a comment you claimed
but this also does not resolve the issue... the pdf content still get added after the image and then continued on new page.
So I tested the proposed change. Obviously I don't have your environment and also not your image, html, and css files. Thus, I used own ones, a small screen shot and "<html><body><h1>Test</h1><p>This is a test piece of html</p></body></html>".
With your code I get:
With the code changed as described above I get
My impression here is that the proposed code change does resolve the issue. The html content is added on a new page.
Thus apparently your either incorrectly applied the proposed change, or you executed old code, or you inspected some old result.

Add Named Destination to PDF using iTextSharp

I'm trying to add a named destination to an existing PDF using iTextSharp, but the example code I've found online has not worked (the original example is in javascript). Am I doing something wrong? Below is the code I'm using, New Text Document.pdf is a single page PDF created from a txt file.
var sourcePdfPath = #"C:\Temp\New Text Document.pdf";
var destPdfPath = #"C:\Temp\New Text Document_bettered.pdf";
using (var reader = new PdfReader(sourcePdfPath))
{
using (var stamper = new PdfStamper(reader, new FileStream(destPdfPath, FileMode.Create)))
{
var destination = new PdfDestination(PdfDestination.FIT);
var writer = stamper.Writer;
writer.AddNamedDestination("Destination.Name", 1, destination);
stamper.Close();
}
reader.Close();
}

Save generated PDF file in azure

I have a form in ASP.NET and in when I fill up the form in the last step it generates a PDF file. I used jsPDF.
What I want is that, the generated pdf file to be send (saved) in Azure storage, does anyone can help me?
Thank you
UPDATE: This is the code that I'm trying, it's working but it's extracting only the text, it doesn't save the pdf as it is:
var account = new CloudStorageAccount(new Microsoft.WindowsAzure.Storage.Auth.StorageCredentials("storageaccount",
"accesskey"), true);
var blobClient = account.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("folderpath");
StringBuilder text = new StringBuilder();
string filePath = "C:\\Users\\username\\Desktop\\toPDF\\testing PDFs\\test.pdf";
if (File.Exists(filePath))
{
PdfReader pdfReader = new PdfReader(filePath);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
}
pdfReader.Close();
using (MemoryStream ms = new MemoryStream())
{
using (var doc = new iTextSharp.text.Document())
{
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
doc.Open();
doc.Add(new Paragraph(text.ToString()));
}
var byteArray = ms.ToArray();
var blobName = "test.pdf";
var blob = container.GetBlockBlobReference(blobName);
blob.Properties.ContentType = "application/pdf";
blob.UploadFromByteArray(byteArray, 0, byteArray.Length);
}
}
I found a simple solution, this is what the code does:
string filePath = "C:\\Users\\username\\Desktop\\toPDF\\testing PDFs\\rpa.pdf";
var credentials = new StorageCredentials("storageaccount","accesskey");
var client = new CloudBlobClient(new Uri("https://jpllanatest.blob.core.windows.net/"), credentials);
// Retrieve a reference to a container. (You need to create one using the mangement portal, or call container.CreateIfNotExists())
var container = client.GetContainerReference("folderpath");
// Retrieve reference to a blob named "myfile.pdf".
var blockBlob = container.GetBlockBlobReference("myfile.pdf");
// Create or overwrite the "myblob" blob with contents from a local file.
using (var fileStream = System.IO.File.OpenRead(filePath))
{
blockBlob.UploadFromStream(fileStream);
}
Click on your solution in Visual Studio, then Add => Add Connected Service => Select Azure Storage, then go through the wizard (if you need, create the storage account - the wizard has that option) and, after that, your solution will be configured with all needed settings (connection string included) and VS will open the page with the detailed tutorial on how to use Azure Storage in your browser. As it has the information and needed code pieces, i willnot include it here (likely, it will change in a future, to avoid the deprecated information).
Tutorial about Add Connected Service => Azure Storage functionality.

ASP.NET MVC EPPlus Download Excel File

So I'm using the fancy EPPlus library to write an Excel file and output it to the user to download. For the following method I'm just using some test data to minimize on the code, then I'll add the code I'm using to connect to database later. Now I can download a file all fine, but when I go to open the file, Excel complains that it's not a valid file and might be corrupted. When I go to look at the file, it says it's 0KB big. So my question is, where am I going wrong? I'm assuming it's with the MemoryStream. Haven't done much work with streams before so I'm not exactly sure what to use here. Any help would be appreciated!
[Authorize]
public ActionResult Download_PERS936AB()
{
ExcelPackage pck = new ExcelPackage();
var ws = pck.Workbook.Worksheets.Add("Sample1");
ws.Cells["A1"].Value = "Sample 1";
ws.Cells["A1"].Style.Font.Bold = true;
var shape = ws.Drawings.AddShape("Shape1", eShapeStyle.Rect);
shape.SetPosition(50, 200);
shape.SetSize(200, 100);
shape.Text = "Sample 1 text text text";
var memorystream = new MemoryStream();
pck.SaveAs(memorystream);
return new FileStreamResult(memorystream, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet") { FileDownloadName = "PERS936AB.xlsx" };
}
Here's what I'm using - I've been using this for several months now and haven't had an issue:
public ActionResult ChargeSummaryData(ChargeSummaryRptParams rptParams)
{
var fileDownloadName = "sample.xlsx";
var contentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
var package = CreatePivotTable(rptParams);
var fileStream = new MemoryStream();
package.SaveAs(fileStream);
fileStream.Position = 0;
var fsr = new FileStreamResult(fileStream, contentType);
fsr.FileDownloadName = fileDownloadName;
return fsr;
}
One thing I noticed right off the bat is that you don't reset your file stream position back to 0.

Categories