Converting html to pdf using iText.Html2pdf is taking too long

Converting html to pdf using iText.Html2pdf is taking too long - c#

Hi I'm trying to convert a HTML String to pdf using iText.Html2pdf.
It's taking almost 3 minutes to do it.
The code is the following (pretty much a basic example):
public byte[] ConvertToPdf(string email)
{
using (var memoryStream = new MemoryStream())
{
var properties = new ConverterProperties()
.SetBaseUri(".")
.SetCreateAcroForm(false)
.SetCssApplierFactory(new DefaultCssApplierFactory())
.SetFontProvider(new DefaultFontProvider())
.SetMediaDeviceDescription(MediaDeviceDescription.CreateDefault())
.SetOutlineHandler(new OutlineHandler())
.SetTagWorkerFactory(new DefaultTagWorkerFactory());
memoryStream.Position = 0;
HtmlConverter.ConvertToPdf(email, memoryStream, properties);
return memoryStream.ToArray();
}
}
The file in question generates 6 pages.
Any suggestion?

Related

Convert MemoryStream to Array C#

I have implemented a code block in order to convert Stream into Byte Array. And code snippet is shown below. But unfortunately, it gives OutOfMemory Exception while converting MemoryStream to Array (return newDocument.ToArray();). please could someone help me with this?
public byte[] MergeToBytes()
{
using (var processor = new PdfDocumentProcessor())
{
AppendStreamsToDocumentProcessor(processor);
using (var newDocument = new MemoryStream())
{
processor.SaveDocument(newDocument);
return newDocument.ToArray();
}
}
}
public Stream MergeToStream()
{
return new MemoryStream(MergeToBytes());
}

Firstly: how big is the document? if it is too big for the byte[] limit: you're going to have to use a different approach.
However, a MemoryStream is already backed by an (oversized) array; you can get this simply using newDocument.TryGetBuffer(out var buffer), and noting that you must restrict yourself to the portion of the .Array indicated by .Offset (usually, but not always, zero) and .Count (the number of bytes that should be considered "live"). Note that TryGetBuffer can return false, but not in the new MemoryStream() scenario.
If is also interesting that you're converting a MemoryStream to a byte[] and then back to a MemoryStream. An alternative here would just have been to set the Position back to 0, i.e. rewind it. So:
public Stream MergeToStream()
{
using var processor = new PdfDocumentProcessor();
AppendStreamsToDocumentProcessor(processor);
var newDocument = new MemoryStream();
processor.SaveDocument(newDocument);
newDocument.Position = 0;
return newDocument;
}

How do I copy multiple streams into 1 for the client to download?

I'm using c# and asp core 3 and have this right now.
string templatePath = Path.Combine(_webHostEnvironment.WebRootPath, #"templates\pdf\test.pdf");
Stream finalStream = new MemoryStream();
foreach (Info p in list)
{
Stream pdfInputStream = new FileStream(path: templatePath, mode: FileMode.Open);
Stream outStream = PdfService.FillForm(pdfInputStream, p);
outStream.Position = 0;
outStream.CopyTo(finalStream);
outStream.Dispose();
pdfInputStream.Dispose();
}
finalStream.Position = 0;
return File(finalStream, "application/pdf", "test.pdf"));
Right now I just get the first PDF when there should be 3. How to combine all the streams (PDF) created in the loop into 1 PDF? I'm using iTextSharp and using this as a guide to produce the FillForm code.
https://medium.com/#taithienbo/fill-out-a-pdf-form-using-itextsharp-for-net-core-4b323cb58459

You can't just combine PDF by adding them into a single stream :-)
You can add each PDF stream to an array and request ITextSharp to combine them and after that returning the newly created stream.
List<Stream> pdfStreams = new List<Stream>();
foreach(var item in list)
{
// Open PDF + fill form
pdfStreams.Add(outstream);
}
var newStream = Merge(pdfStreams);
return File(newStream)
I don't know ITextSharp but it seems you can merge PDFs : https://weblogs.sqlteam.com/mladenp/2014/01/10/simple-merging-of-pdf-documents-with-itextsharp-5-4-5/
Edit
By the way, you could use "using" statement for stream (you wouldn't have to call dispose yourself) and I don't know how heavy are your PDFs but you should maybe consider to use the ".CopyToAsync".

.Net C# DocX librairy : how to serialize a DocX document?

When using the DocX librairy, i am generating the docx document on server then download it.
For that, i need to convert my document in an array of bytes.
To do that, i was previousloy saving the document as a physical file like this :
// Save all changes to this document.
document.SaveAs(GENERATED_DOCX_LOCATION);
return System.IO.File.ReadAllBytes(GENERATED_DOCX_LOCATION);
but i would rather not do that. Is it possible to serialize this object to download it without saving it physically ?
I already tried that :
private byte[] ObjectToByteArray(object obj)
{
if (obj == null)
return null;
BinaryFormatter bf = new BinaryFormatter();
using (MemoryStream ms = new MemoryStream())
{
bf.Serialize(ms, obj);
return ms.ToArray();
}
}
With :
return this.ObjectToByteArray(document);
But obviously, DocX doesn't implement ISerializable.
EDIT : the code below doesn't work either
byte[] byteArray = null;
using (var stream = new MemoryStream())
{
document.SaveAs(stream);
byteArray = stream.ToArray();
}
return byteArray;

Try this, replace my c:\temp... path with your document location and this will get and write the file for you from the byte array
void Main()
{
byte[] bytes = System.IO.File.ReadAllBytes(#"C:\temp\test.csv");
using (var bw = new BinaryWriter(File.Open(#"C:\temp\test2.csv", FileMode.OpenOrCreate)))
{
bw.Write(bytes);
bw.Flush();
}
}

There was no way to do it via the original DocX library.
It's a shame as this library uses a MemoryStream to manupulate the datas.
To solve my problem i simply added a new public readonly property to expose the private MemoryStream variable, then i used this simple code :
Code added in the DocX project :
public MemoryStream DocumentMemoryStream { get { return this.memoryStream; } }
Code in my main function :
return document.DocumentMemoryStream.ToArray();

combining byte arrays to pdf

I've a byte array which I get from an API.
byte[] sticker = db.call_API_print_sticker(Id);
I have to call this method a number of times and then convert to pdf. I want to store it in an array of arrays and then convert them once I have all them
How do I store it and then combine the byte array pdfs to one.

Using PDFSharp as a Nuget, I wrote the following C# method that purely works with byte arrays:
public byte[] CombinePDFs(List<byte[]> srcPDFs)
{
using (var ms = new MemoryStream())
{
using (var resultPDF = new PdfDocument(ms))
{
foreach (var pdf in srcPDFs)
{
using (var src = new MemoryStream(pdf))
{
using (var srcPDF = PdfReader.Open(src, PdfDocumentOpenMode.Import))
{
for (var i = 0; i < srcPDF.PageCount; i++)
{
resultPDF.AddPage(srcPDF.Pages[i]);
}
}
}
}
resultPDF.Save(ms);
return ms.ToArray();
}
}
}
So the above method takes an array list of source PDFs and combine them and returns a single byte array for the result PDF.

The byte[] is just one pdf probably. I would think that you could just do
System.IO.File.WriteAllBytes(#"sticker.pdf", sticker);
If that is not the case, the easiest way would be to use a nuget package ex: PdfSharp to combine multiple pdfs into one.
An example of combining pdfs
The gist (which assumes each sticker contains 1 page):
IEnumerable<byte[]> stickers;
using (var combinedPdf = new PdfDocument(#"stickers.pdf"))
foreach (var pdf in stickers)
using (MemoryStream ms = new MemoryStream(pdf))
{
var someSticker = PdfReader.Open(ms);
combinedPdf.AddPage(someSticker.Pages[0]);
}

Print PDF Files without third party classes C#

I’m searching for quite a while how to print “PDF” files in c# ,
I’m trying to print shipping labels which I get it in a “GZip Stream” string and the format is a pdf,
So my question is what’s is the best way to print out the “PDF” label (Not images or any image format), and also be able to set to which printer to print?
The best way would be without having to save the label in my computer and then recall the file!
it's doesn't make sense that the only way is to install third party classes!
This is what i have done!
private void PrintFDFLabel(string imageLabel)
{
var byteStream = Convert.FromBase64String(imageLabel);
MemoryStream memoryStream = Decompress(byteStream);// i need to Decompress the Gzip
PrintDocument print = new PrintDocument();
print.PrinterSettings.PrinterName = Properties.Settings.Default.DefaultPrimePrinter;
print.Print();
}
private MemoryStream Decompress(byte[] b)
{
MemoryStream memoryStream;
using (var ms = new MemoryStream())
{
using (var bs = new MemoryStream(b))
using (GZipStream gZipStream = new GZipStream(bs, CompressionMode.Decompress))
{
memoryStream = new MemoryStream();
gZipStream.CopyTo(memoryStream);
}
return memoryStream;
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Converting html to pdf using iText.Html2pdf is taking too long - c#

Related

Convert MemoryStream to Array C#

How do I copy multiple streams into 1 for the client to download?

.Net C# DocX librairy : how to serialize a DocX document?

combining byte arrays to pdf

Print PDF Files without third party classes C#

Categories

Resources