How do I convert several web pages into one pdf document?

How do I convert several web pages into one pdf document? - c#

public FileResult Download()
{
var doc = new EO.Pdf.PdfDocument();
EO.Pdf.HtmlToPdf.ConvertUrl("http://www.google.com/", doc);
var ms = new MemoryStream();
doc.Save(ms);
ms.Position = 0;
return new FileStreamResult(ms, "application/pdf")
{
FileDownloadName = "download.pdf"
};
}
Can you please show if possible, how to extend the code above to be able to convert several web pages into one pdf document?
The tricky part is that we don't know what pages a user is likely to attempt to convert.
So, hardcoding the webpages as the code above shows isn't helping us.
Any help is greatly appreciated.
//Create a new PdfDocument object
var doc = new EO.Pdf.PdfDocument();
//Convert two ore more different pages into the same PdfDocument
EO.Pdf.HtmlToPdf.ConvertUrl("c:\\1.html", doc);
EO.Pdf.HtmlToPdf.ConvertUrl("c:\\2.html", doc);
Latest code:
public FileResult Download()
{
var doc = new EO.Pdf.PdfDocument();
foreach(var url in passedUrls)
{
EO.Pdf.HtmlToPdf.ConvertUrl(url, doc);
doc.Save(ms);
}
ms.Position = 0;
return new FileStreamResult(ms, "application/pdf")
{
FileDownloadName = "download.pdf"
};
}
Latest from Adam (thank you sir)
public FileResult Download()
{
var documents = new List<EO.Pdf.PdfDocument>();
foreach(var url in passedUrls)
{
var doc = new EO.Pdf.PdfDocument();
EO.Pdf.HtmlToPdf.ConvertUrl(url, doc);
documents.Add(doc);
}
EO.Pdf.PdfDocument mergedDocument = EO.Pdf.PdfDocument.Merge(documents.ToArray());
}
Hopefully, others find these codes useful.

Based on the Help documentation I would recommend the following:
public FileResult Download()
{
var urls = new List<string>
{ // Populate list with urls
"C:\\1.html",
"C:\\2.html"
};
var documents = new List<EO.Pdf.PdfDocument>();
foreach(var url in urls)
{
var doc = new EO.Pdf.PdfDocument();
EO.Pdf.HtmlToPdf.ConvertUrl(url, doc);
documents.Add(doc);
}
EO.Pdf.PdfDocument mergedDocument = EO.Pdf.PdfDocument.Merge(documents.ToArray());
var ms = new MemoryStream();
mergedDocument.Save(ms);
ms.Position = 0;
return new FileStreamResult(ms, "application/pdf") { FileDownloadName = "download.pdf" };
}

Pass an array of url strings to the function
Then
foreach(var url in passedUrls)
{
EO.Pdf.HtmlToPdf.ConvertUrl(url, doc);
doc.Save(ms);
}

Related

Blazor WASM Load and display large pdfs by splitting them as streams

I'm working on a Blazor WASM App and I want my users to easily open pdf files on specific pages that contain additional information.
I cannot distribute those files myself or upload them to any kind of server. Each user has to provide them themselves.
Because the files are up to 60MB big I cannot convert the uploaded file to base64 and display them as described here.
However I don't have to display the whole file and could just load the needed page +- some pages around them.
For that I tried using iText7 ExtractPageRange(). This answer indicates, that I have to override the GetNextPdfWriter() Method and to store all streams in an collection.
class ByteArrayPdfSplitter : PdfSplitter {
public ByteArrayPdfSplitter(PdfDocument pdfDocument) : base(pdfDocument) {
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange) {
CurrentMemoryStream = new MemoryStream();
UsedStreams.Add(CurrentMemoryStream);
return new PdfWriter(CurrentMemoryStream);
}
public MemoryStream CurrentMemoryStream { get; private set; }
public List<MemoryStream> UsedStreams { get; set; } = new List<MemoryStream>();
Then I thought I could merge those streams and convert them to base64
var file = loadedFiles.First();
using (MemoryStream ms = new MemoryStream())
{
var rs = file.OpenReadStream(maxFileSize);
await rs.CopyToAsync(ms);
ms.Position = 0;
//rs needed to be converted to ms, because the PdfReader constructer uses a
//synchronious read that isn't supported by rs and throws an exception.
PdfReader pdfReader = new PdfReader(ms);
var document = new PdfDocument(pdfReader);
var splitter = new ByteArrayPdfSplitter(document);
var range = new PageRange();
range.AddPageSequence(1, 10);
var splitDoc = splitter.ExtractPageRange(range);
//Edit commented this out, shouldn't have been here at all leads to an exception
//splitDoc.Close();
var outputMs = new MemoryStream();
foreach (var usedMs in splitter.UsedStreams)
{
usedMs.Position = 0;
outputMs.Position = outputMs.Length;
await usedMs.CopyToAsync(outputMs);
}
var data = outputMs.ToArray();
currentPdfContent = "data:application/pdf;base64,";
currentPdfContent += Convert.ToBase64String(data);
pdfLoaded = true;
}
This however doesn't work.
Has anyone a suggestion how to get this working? Or maybe a simpler solution I could try.
Edit:
I took a closer look in debug and it seems like, the resulting stream outputMs is always empty. So it is probably a problem in how I split the pdf.

After at least partially clearing up my misconception of what it means to not being able to access the file system from blazor WASM I managed to find a working solution.
await using MemoryStream ms = new MemoryStream();
var rs = file.OpenReadStream(maxFileSize);
await using var fs = new FileStream("test.pdf", FileMode.Create)
fs.Position = 0;
await rs.CopyToAsync(fs);
fs.Close();
string path = "test.pdf";
string range = "10 - 15";
var pdfDocument = new PdfDocument(new PdfReader("test.pdf"));
var split = new MySplitter(pdfDocument);
var result = split.ExtractPageRange(new PageRange(range));
result.Close();
await using var splitFs = new FileStream("split.pdf", FileMode.Open))
await splitFs.CopyToAsync(ms);
var data = ms.ToArray();
var pdfContent = "data:application/pdf;base64,";
pdfContent += System.Convert.ToBase64String(data);
Console.WriteLine(pdfContent);
currentPdfContent = pdfContent;
With the MySplitter Class from this answer.
class MySplitter : PdfSplitter
{
public MySplitter(PdfDocument pdfDocument) : base(pdfDocument)
{
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange)
{
String toFile = "split.pdf";
return new PdfWriter(toFile);
}
}

Create a DocX and save to client machine c#

I am using the library from:
https://github.com/xceedsoftware/DocX/blob/master/Xceed.Document.NET/Src/Document.cs
in an MVC Project.
The functionality is now sitting on my server;
public void Build(Results umbracoFormValues)
{
var ms = new MemoryStream();
using (var document = DocX.Create(ms))
{
var heading = new Heading();
heading.Branding(document);
var sections = umbracoFormValues.SectionResults;
foreach (var section in sections)
{
heading.Render(document, section);
ConstructFieldTypes(document, section);
new Footer().Render(document);
}
ms.Position = 0;
document.SaveAs(new CreateNew().DocumentFileName(umbracoFormValues));
}
}
How do I then grab the newly created file and make it downloadable to the user/client?
Many thanks in advance.

You can change your Build method to return the stream:
public Stream Build(Results umbracoFormValues)
{
var ms = new MemoryStream();
//code omitted for simplicity
return ms;
}
Then in your action:
public async Task<IActionResult> MyAction(Results umbracoFormValues)
{
var stream = Build(umbracoFormValues);
var mimeType = "application/vnd.openxmlformats-officedocument.wordprocessing";
var fileName = "myReport.docx";
return File(stream, mimeType, fileName);
}
Keep in mind that there may be some differences depending on the dotnet core/framework you're using, as you haven't specified that.

DotNetZip creates 0kb files when passing in memory stream

I have a Razor page which I want to generate a Zip file containing multiple CSV files.
It works fine when I just want to generate one file, e.g.
public async Task<FileStreamResult> OnGet(int id)
{
var bankDetails = _paymentFileGenerator.GeneratePaymentFiles(id);
await using var memoryStream = new MemoryStream();
await using var streamWriter = new StreamWriter(memoryStream);
await using var csvWriter = new CsvWriter(streamWriter, CultureInfo.InvariantCulture)
{
Configuration = { HasHeaderRecord = false, }
};
csvWriter.WriteRecords(bankDetails);
streamWriter.Flush();
return new FileStreamResult(new MemoryStream(memoryStream.ToArray()), new MediaTypeHeaderValue("text/csv"))
{
FileDownloadName = "bacs.csv"
};
}
But when I try to pass memory streams for two files into a DotNetZip stream the zip downloads to the browser but both files are 0kb. Any thoughts on why?
public async Task<FileStreamResult> OnGet(int id)
{
var bankFiles = _paymentFileGenerator.GeneratePaymentFiles(id);
using var zipStream = new MemoryStream();
using var zip = new ZipFile();
await using var bankFileStream = new MemoryStream();
await using var bankFileStreamWriter = new StreamWriter(bankFileStream);
await using var bankFileCsvWriter = new CsvWriter(bankFileStreamWriter, CultureInfo.InvariantCulture)
{
Configuration = { HasHeaderRecord = false, }
};
bankFileCsvWriter.WriteRecords(bankFiles.BankFile);
bankFileCsvWriter.Flush();
bankFileStream.Seek(0, SeekOrigin.Begin);
zip.AddEntry("bacs.csv", (name, stream) => bankFileStream.ToArray());
await using var internalFileStream = new MemoryStream();
await using var internalFileStreamWriter = new StreamWriter(internalFileStream);
await using var internalFileCsvWriter = new CsvWriter(internalFileStreamWriter, CultureInfo.InvariantCulture);
internalFileCsvWriter.WriteRecords(bankFiles.InternalFile);
internalFileCsvWriter.Flush();
internalFileStream.Seek(0, SeekOrigin.Begin);
zip.AddEntry("internal.csv", (name, stream) => internalFileStream.ToArray());
zip.Save(zipStream);
zipStream.Seek(0, SeekOrigin.Begin);
return new FileStreamResult(new MemoryStream(zipStream.ToArray()), new MediaTypeHeaderValue("application/zip"))
{
FileDownloadName = "paymentbatch.zip"
};
}
I've seen other StackOverflow posts where people suggested adding the Seek() function to reset the position of the streams but it didn't work for me whether that was there or not.
When debugging, I can see that the 'bankfileStream' stream has bytes in it when I call the zip.AddEntry() but then the zipStream shows 0 bytes when I call zip.Save(zipStream).
Any suggestions appreciated!

I tried many different options and nothing worked until I used the SharpZipLib library instead. Here is the full solution:
public async Task<FileStreamResult> OnGet(int id)
{
var bankFiles = _paymentFileGenerator.GeneratePaymentFiles(id);
var bankFileBytes = await GetCsvFileBytes(bankFiles.BankFile, includeHeader: false);
var internalFileBytes = await GetCsvFileBytes(bankFiles.InternalFile);
var files = new List<AttachedFile>
{
new AttachedFile("bacs.csv", bankFileBytes),
new AttachedFile("internal.csv", internalFileBytes)
};
var zipStream = AddFilesToZip(files);
return new FileStreamResult(zipStream, new MediaTypeHeaderValue("application/zip"))
{
FileDownloadName = "paymentbatch.zip"
};
}
public MemoryStream AddFilesToZip(List<AttachedFile> attachedFiles)
{
var outputMemStream = new MemoryStream();
using (var zipStream = new ZipOutputStream(outputMemStream))
{
// 0-9, 9 being the highest level of compression
zipStream.SetLevel(3);
foreach (var file in attachedFiles)
{
var newEntry = new ZipEntry(file.Name) {DateTime = DateTime.Now};
zipStream.PutNextEntry(newEntry);
StreamUtils.Copy(new MemoryStream(file.Bytes), zipStream, new byte[4096]);
}
zipStream.CloseEntry();
// Stop ZipStream.Dispose() from also Closing the underlying stream.
zipStream.IsStreamOwner = false;
}
outputMemStream.Position = 0;
return outputMemStream;
}
private static async Task<byte[]> GetCsvFileBytes<T>(List<T> records, bool includeHeader = true) where T : class
{
await using var bankFileStream = new MemoryStream();
await using var bankFileStreamWriter = new StreamWriter(bankFileStream);
await using var bankFileCsvWriter = new CsvWriter(bankFileStreamWriter, CultureInfo.InvariantCulture)
{
Configuration = {HasHeaderRecord = includeHeader}
};
bankFileCsvWriter.WriteRecords(records);
bankFileStreamWriter.Flush();
return bankFileStream.ToArray();
}
public class AttachedFile
{
public byte[] Bytes { get; set; }
public string Name { get; set; }
public AttachedFile(string name, byte[] bytes)
{
Bytes = bytes;
Name = name;
}
}

iText7 Create PDF in memory instead of physical file

How do one create PDF in memorystream instead of physical file using itext7?
I have no idea how to do it in the latest version, any help?
I tried the following code, but pdfSM is not properly populated:
string filePath = "./abc.pdf";
MemoryStream pdfSM = new ByteArrayOutputStream();
PdfDocument doc = new PdfDocument(new PdfReader(filePath), new PdfWriter(pdfSM));
.......
doc.close();
The full testing code as below for your reference, it worked when past filePath into PdfWriter but not for the memory stream:
public static readonly String sourceFolder = "../../FormTest/";
public static readonly String destinationFolder = "../../Output/";
static void Main(string[] args)
{
String srcFilePattern = "I-983";
String destPattern = "I-129_2014_";
String src = sourceFolder + srcFilePattern + ".pdf";
String dest = destinationFolder + destPattern + "_flattened.pdf";
MemoryStream returnSM = new MemoryStream();
PdfDocument doc = new PdfDocument(new PdfReader(src), new PdfWriter(returnSM));
PdfAcroForm form = PdfAcroForm.GetAcroForm(doc, false);
foreach (PdfFormField field in form.GetFormFields().Values)
{
var fieldName = field.GetFieldName();
var type = field.GetType();
if (fieldName != null)
{
if (type.Name.Equals("PdfTextFormField"))
{
field.SetValue("T");
}
}
}
form.FlattenFields();
doc.Close();
}

This works for me.
public byte[] CreatePdf()
{
var stream = new MemoryStream();
var writer = new PdfWriter(stream);
var pdf = new PdfDocument(writer);
var document = new Document(pdf);
document.Add(new Paragraph("Hello world!"));
document.Close();
return stream.ToArray();
}

I needed the same thing. Got it working like this:
(I included some settings which improve performance)
string HtmlString = "<html><head></head><body>some content</body></html>";
byte[] buffer;
PdfDocument pdfDoc = null;
using (MemoryStream memStream = new MemoryStream())
{
using(PdfWriter pdfWriter = new PdfWriter(memStream, wp))
{
pdfWriter.SetCloseStream(true);
using (pdfDoc = new PdfDocument(pdfWriter))
{
ConverterProperties props = new ConverterProperties();
pdfDoc.SetDefaultPageSize(PageSize.LETTER);
pdfDoc.SetCloseWriter(true);
pdfDoc.SetCloseReader(true);
pdfDoc.SetFlushUnusedObjects(true);
HtmlConverter.ConvertToPdf(HtmlString, pdfDoc, props));
pdfDoc.Close();
}
}
buffer = memStream.ToArray();
}
return buffer;

iText7、C# Controller
Error:
public ActionResult Report()
{
//...
doc1.Close();
return File(memoryStream1, "application/pdf", "pdf_file_name.pdf");
}
Work:
public ActionResult Report()
{
//...
doc1.Close();
byte[] byte1 = memoryStream1.ToArray();
return File(byte1, "application/pdf", "pdf_file_name.pdf");
}
I don't know why... but, it's working!
another: link

Generate a pdf from view Mvc 3

Today I received a task to generate a pdf from view .... as I am beginner in programming ... someone would help me with this task .. passing some tips .. where to start researching. Pos'm having difficulty doing this task.
I tried to use an example in this link
http://www.codeproject.com/Articles/260470/PDF-reporting-using-ASP-NET-MVC3
  but it always throws an error in this part of the code
public byte[] Render(string htmlText, string pageTitle)
{
byte[] renderedBuffer;
using (var outputMemoryStream = new MemoryStream())
{
using (var pdfDocument = new Document(PageSize.A4, HorizontalMargin, HorizontalMargin, VerticalMargin, VerticalMargin))
{
PdfWriter pdfWriter = PdfWriter.GetInstance(pdfDocument, outputMemoryStream);
pdfWriter.CloseStream = false;
pdfWriter.PageEvent = new PrintHeaderFooter { Title = pageTitle };
pdfDocument.Open();
using (var htmlViewReader = new StringReader(htmlText))
{
using (var htmlWorker = new HTMLWorker(pdfDocument))
{
htmlWorker.Parse(htmlViewReader);// erro here
}
}
}
renderedBuffer = new byte[outputMemoryStream.Position];
outputMemoryStream.Position = 0;
outputMemoryStream.Read(renderedBuffer, 0, renderedBuffer.Length);
}
return renderedBuffer;
}

It is not answer to your question per se, but we are mostly using NReco.PDF
And example of the code to generate PDF from HTML content would be:
var htmlContent = String.Format("<body>Hello world: {0}</body>",
DateTime.Now);
var htmlToPdf = new NReco.PdfGenerator.HtmlToPdfConverter();
var pdfBytes = htmlToPdf.GeneratePdf(htmlContent);
So only thing missing in this example is getting output of you MVC controller as HTML.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I convert several web pages into one pdf document? - c#

Pass an array of url strings to the function Then foreach(var url in passedUrls) { EO.Pdf.HtmlToPdf.ConvertUrl(url, doc); doc.Save(ms); }

Related

Blazor WASM Load and display large pdfs by splitting them as streams

Create a DocX and save to client machine c#

DotNetZip creates 0kb files when passing in memory stream

iText7 Create PDF in memory instead of physical file

Generate a pdf from view Mvc 3

Categories

Resources