iText7 & C# extract pages from PDF stored in MSSQL DB

iText7 & C# extract pages from PDF stored in MSSQL DB - c#

I have a PDF document already stored in the database. It's stored as an "image" datatype (I had no choice with that). I'm working in C# .NET with iText 7, and the database is MSSQL. I want to parse that database data into smaller sections of that file. I don't have the original PDF file, only the image stored in the database.
I want to separate the original PDF data (image) into pages. That is, if we had the original file, I want to split that large file by every 2 pages into new files (e.g., a 10 page PDF would become 5 files, 2 pages each). I want to then store those smaller "PDFs" into the databases as well.
Is there any way to do this entirely in code? Or do I need to create a PDF file in the file system, and then create new files based on the file, and then import the individual files back into the database?
Thanks for the help.

Read about MemoryStream, that can hold your data without creating the file. Example,
// database part here
SqlDataReader dr = cmd.ExecuteReader();
if (dr.Read())
{
byte[] pdf = (byte[])dr["image_pdf"];
MemoryStream ms = new MemoryStream(picarr);
ms.Seek(0, SeekOrigin.Begin);
// PDF part here
Document doc = new Document();
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
doc.Open();
...

Related

iText7 C# - Merging same pdf document multiple times reduces file size

I have a .NET 6.0 Windows Forms Application with iText7 added from NuGet package manager.
In the UI I have:
A Open File Dialogue to allow user to select a single .pdf file
A text box to allow to enter a number. Content of the selected pdf will be multiplied this many times.
A button to generate a single output .pdf file.
When user clicks on the generate output button - a single output.pdf need to be generated and it should have the content of the original .pdf duplicated as many time as user specified in the text box.
Notes:
In the UI, user can select any valid .pdf file (content could be anything - text or image or combination or whatever really). Let's say the size of this .pdf file is 2 Mb.
Code I have:
GenerateContentDuplicatedPDF(string existingFileFullPath, string outputFileFullPath, int iNumberOfMerges)
{
using (var memoryStream = new MemoryStream())
{
using (PdfReader pdfReader = new PdfReader(existingFileFullPath))
{
using (PdfDocument SourceDocument1 = new PdfDocument(pdfReader))
{
WriterProperties properties = new WriterProperties().SetCompressionLevel(CompressionConstants.NO_COMPRESSION);
using (PdfWriter pdfWriter = new PdfWriter(memoryStream, properties))
{
using (PdfDocument pdfDocument = new PdfDocument(pdfWriter))
{
PdfMerger merge = new PdfMerger(pdfDocument);
for (int i = 0; i < iNumberOfMerges; i++)
{
merge.Merge(SourceDocument1, 1, SourceDocument1.GetNumberOfPages());
}
merge.Close();
SourceDocument1.Close();
byte[] result = memoryStream.ToArray();
File.WriteAllBytes(outputFileFullPath, result);
}
}
}
}
}
}
It works and the output is as expected. The generated .pdf has content duplicated as expected.
What is the issue:
I was expecting the output file that gets generated would have the size multiplied by the number of times user wants the content duplicated.
For example: If user has selected a 2 Mb file, and entered 10 as the number, then I expect the output to be 20Mb. But iText7 is generating a file around 2.7 Mb. This is happening for all types of content, be it text or image or combination.
I have set the compression level to No compression too but still its the same. I want the generated output file to multiple the file size as well.
Not sure what is going wrong here or if iText7 is cleverly optimized to minimize the pdf size with duplicate content when the file is generated. Can I override this behavior?

How to get binary data of PDF with out generating

I have my PDF code and existing PDF layout where I am adding the data and creating the new pdf file, but I am not able to create my new pdf file or not able to download.
I need to read the newly generate PDF file and need to written the binary data
Please review below code
MemoryStream pdfms = new MemoryStream();
PdfReader reader;
reader = new PdfReader(HttpContext.Current.Server.MapPath("20171010_BillTemplate.pdf"));
PdfStamper formFiller = new PdfStamper(reader, pdfms);
AcroFields pdfBillingFields = formFiller.AcroFields;
pdfBillingFields.SetField("CT_Mail_Block",MailBlock.ToUpper());// some data
pdfBillingFields.SetField("Cash_Only", Cash_Only);
formFiller.FormFlattening = true;
formFiller.Writer.CloseStream = false;
reader = new PdfReader(pdfms); // giving error
formFiller.Close();
pdfms.Dispose();
What I need is with or with out Creating newly created file I need to get the binary data of file and send in return.
I had implemented this code in webapi
Reader is giving error as
PDF header signature not found.
Please say is this a right way to get the binary data from above code?

Generating PDF in windows forms using XML reader

My windows form contains a textbox in which we need to enter html tags,One button to generate PDF.
And we need to load the textbox content into XML Reader and process each element of XML recursively then we need to generate a PDF file.
The PDF file must contain the data i.e;
for example if I entered tag in the text box in the pdf file it must display a table.
I am very new to Windows forms and XML also can any one help me to complete this task

You would need to use a library to create PDF files. iTextSharp is a common library which can help. Take a look at this library and samples, you would be able to create PDF files easily from your application
https://sourceforge.net/projects/itextsharp/
iText is a PDF library that allows you to CREATE, ADAPT, INSPECT and MAINTAIN documents in the Portable Document Format (PDF):
iTextSharp is the .NET port.

Got Answer with this simple code
Document document = new Document();
PdfWriter.GetInstance(document, new FileStream(Request.PhysicalApplicationPath + "\\MySamplePDF.pdf", FileMode.Create));
document.Open();
iTextSharp.text.html.simpleparser.HTMLWorker hw =
new iTextSharp.text.html.simpleparser.HTMLWorker(document);
hw.Parse(new StringReader(htmlText));
document.Close();
but my problem is the path I want to select the path dynamically.Can any one help me how to set the path dynamically in the above code.

PdfWriter.GetInstance(document, new FileStream(Request.PhysicalApplicationPath + "\\MySamplePDF.pdf", FileMode.Create));
Changed this code by using a savefile dialog box
SaveFileDialog svg = new SaveFileDialog();
svg.ShowDialog();
PdfWriter.GetInstance(document, new FileStream(svg.FileName + ".pdf", FileMode.Create));

Read a stored PDF from memory stream

I'm working on a database project using C# and SQLServer 2012. In one of my forms I have a PDF file with some other information that is stored in a table. This is working successfully, but when I want to retrieve the stored information I have a problem with displaying the PDF file, because I can't display it and I don't know how to display it.
I read some articles that said it can not be displayed with Adobe PDF viewer from a memory stream, is there any way to that?
This is my code for retrieving the data from the database:
sql_com.CommandText = "select * from incoming_boks_tbl where [incoming_bok_id]=#incoming_id and [incoming_date]=#incoming_date";
sql_com.Parameters.AddWithValue("incoming_id",up_inco_num_txt.Text);
sql_com.Parameters.AddWithValue("incoming_date", up_inco_date_txt.Text);
sql_dr = sql_com.ExecuteReader();
if(sql_dr.HasRows)
{
while(sql_dr.Read())
{
up_incoming_id_txt.Text = sql_dr[0].ToString();
up_inco_num_txt.Text = sql_dr[1].ToString();
up_inco_date_txt.Text = sql_dr[2].ToString();
up_inco_reg_txt.Text = sql_dr[3].ToString();
up_inco_place_txt.Text = sql_dr[4].ToString();
up_in_out_txt.Text = sql_dr[5].ToString();
up_subj_txt.Text = sql_dr[6].ToString();
up_note_txt.Text = sql_dr[7].ToString();
string file_ext = sql_dr[8].ToString();//pdf file extension
byte[] inco_file = (byte[])(sql_dr[9]);//the pdf file
MemoryStream ms = new MemoryStream(inco_file);
//here I don't know what to do with memory stream file data and where to store it. How can i display it?
}
}

This answer should give you some options: How to render pdfs using C#
In the past I have used Googles open source PDF rendering project - PDFium
There is a C# nuget package called PdfiumViewer which gives a C# wrapper around PDFium and allows PDFs to be displayed and printed.
It works directly with Streams so doesn't require any data to be written to disk
This is my example from a WinForms app
public void LoadPdf(byte[] pdfBytes)
{
var stream = new MemoryStream(pdfBytes);
LoadPdf(stream)
}
public void LoadPdf(Stream stream)
{
// Create PDF Document
var pdfDocument = PdfDocument.Load(stream);
// Load PDF Document into WinForms Control
pdfRenderer.Load(_pdfDocument);
}

iTextSharp - How to generate a RTF document in the ClipBoard instead of a file

I would like to generate a PDF or RTF document using iTextSharp library that can be copied to the clipboard, using the same code I use to generate the document on a file (FileStream).
This way my application would give the user two options: generate to a file or to the clipboard.

Basically every iTextSharp document is attached to a System.IO.Stream.
Document doc = new Document(PageSize.A4);
RtfWriter2.GetInstance(doc, stream);
Usually we save the document to a file, using FileStream. To use the same code to paste the document in the Clipboard, we use a MemoryStream instead.
MemoryStream stream = new MemoryStream();
Document doc = new Document(PageSize.A4);
RtfWriter2.GetInstance(doc, stream);
// (...) document content
doc.Close();
string rtfText = ASCIIEncoding.ASCII.GetString(stream.GetBuffer());
stream.Close();
Clipboard.SetText(rtfText, TextDataFormat.Rtf);
I only had problems with Images: It seems that iTextSharp exports images saving the bytes of the image after the \bin tag. Some libraries put the binary content encoded as hex characters. When I paste (from memory) in Word, the images won't appear, but if I load from a file, everything is OK. Any suggestions?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

iText7 & C# extract pages from PDF stored in MSSQL DB - c#

Related

iText7 C# - Merging same pdf document multiple times reduces file size

How to get binary data of PDF with out generating

Generating PDF in windows forms using XML reader

Read a stored PDF from memory stream

iTextSharp - How to generate a RTF document in the ClipBoard instead of a file

Categories

Resources