I would like to generate the PDF based on the View, but I don't want to display it after generating it. Just save to disk.
As I am in Azure, I had to use the version with Docker, but it did not print the footer (page count)
With that I will use iText7 to add the footer (page count) to delete the original PDF and display the new output.
Huge job, but the only way I found it, since Rotativa and other components that work with wkhtmltopdf.org did not print the CSS correctly.
So my problem is:
How to save the PDF without displaying it?
With the example of the site:
https://jsreport.net/learn/dotnet-aspnetcore#save-to-file
Itneeds the return View() which makes me unable to display the modified
PDF and not the original
.
[MiddlewareFilter(typeof(JsReportPipeline))]
public async Task<IActionResult> InvoiceWithHeader()
{
HttpContext.JsReportFeature().Recipe(Recipe.ChromePdf);
HttpContext.JsReportFeature().OnAfterRender((r) => {
using (var file = System.IO.File.Open("report.pdf", FileMode.Create))
{
r.Content.CopyTo(file);
}
r.Content.Seek(0, SeekOrigin.Begin);
});
return View(InvoiceModel.Example());
}
OnAfterRender does not answer my problem, is there how to do the step as I said? or is there another better solution?
Generate PDF of Action by JsReport
Save PDF from JsReport
Add page count by iText7
Delete original PDF from JsReport
View the pdf modified by iText7 in the view
NOTE: using new jsreport.Local.LocalReporting() works perfectly the problem was when going up to Azure.
Update:
I'm tried, but it didn't work
var htmlContent = await JsReportMVCService.RenderViewToStringAsync(HttpContext, RouteData, "/Views/OcorrenciaTalaos/GerarPdf.cshtml", retorno);
(var contentType, var generatedFile) = await GeneratePDFAsync(htmlContent);
using (var fileStream = new FileStream("tempJsReport.pdf", FileMode.Create))
{
await generatedFile.CopyToAsync(fileStream);
}
public async Task<(string ContentType, MemoryStream GeneratedFileStream)> GeneratePDFAsync(string htmlContent)
{
IJsReportFeature feature = new JsReportFeature(HttpContext);
feature.Recipe(Recipe.ChromePdf);
if (!feature.Enabled) return (null, null);
feature.RenderRequest.Template.Content = htmlContent;
// var htmlContent = await JsReportMVCService.RenderViewToStringAsync(HttpCSexontext, RouteData, "GerarPdf", retorno);
var report = await JsReportMVCService.RenderAsync(feature.RenderRequest);
var contentType = report.Meta.ContentType;
MemoryStream ms = new MemoryStream();
report.Content.CopyTo(ms);
return (contentType, ms);
}
You can overwrite the final response inside the OnAfterRender this way:
[MiddlewareFilter(typeof(JsReportPipeline))]
public IActionResult InvoiceDownload()
{
HttpContext.JsReportFeature().Recipe(Recipe.ChromePdf)
.OnAfterRender((r) =>
{
// write current report to file
using(var fileStream = System.IO.File.Create("c://temp/out.pdf"))
{
r.Content.CopyTo(fileStream);
}
// do modifications
// ...
// overwrite response with a new pdf
r.Content = System.IO.File.OpenRead("c://temp/final.pdf");
});
return View("Invoice", InvoiceModel.Example());
}
However, the page numbers should work and you shouldn't need to do it this complicated way. No matter you are in docker or not. Here is the answer in another question
Related
I'm trying to extract the text from the following PDF with the following code (using iText7 7.2.2) :
var source = (string)GetHttpResult("https://www.bcr.ro/content/dam/ro/bcr/www_bcr_ro/Aur/Cotatii_Aur.pdf", new CookieContainer());
var bytes = Encoding.UTF8.GetBytes(source);
var stream = new MemoryStream(bytes);
var reader = new PdfReader(stream);
var doc = new PdfDocument(reader);
var pages = doc.GetNumberOfPages();
var text = PdfTextExtractor.GetTextFromPage(doc.GetPage(1));
Loading the PDF in my browser (Edge 100.0) works fine.
GetHttpResult() is a simple HttpClient defining a custom CookieContainer, a custom UserAgent, and calling ReadAsStringAsync(). Nothing fancy.
source has the correct PDF content, starting with "%PDF-1.7".
pages has the correct number of pages, which is 2.
But, whatever I try, text is always empty.
Defining an explicit TextExtractionStrategy, trying some Encodings, extracting from all pages in a loop, ..., nothing matters, text is always empty, with no Exception thrown anywhere.
I think I don't read this PDF how it's "meant" to be read, but what is the correct way then (correct content in source, correct number of pages, no Exception anywhere) ?
Thanks.
That's it ! Thanks to mkl and KJ !
I first downloaded the PDF as a byte array so I'm sure it's not modified in any way.
Then, as pdftotext is able to extract the text from this PDF, I searched for a NuGet package able to do the same. I tested almost ten of them, and FreeSpire.PDF finally did it !
Update : Actually, FreeSpire.PDF missed some words, so I finally found PdfPig, able to extract every single word.
Code using PdfPig :
using UglyToad.PdfPig;
using UglyToad.PdfPig.Content;
byte[] bytes;
using (HttpClient client = new())
{
bytes = client.GetByteArrayAsync("https://www.bcr.ro/content/dam/ro/bcr/www_bcr_ro/Aur/Cotatii_Aur.pdf").GetAwaiter().GetResult();
}
List<string> words = new();
using (PdfDocument document = PdfDocument.Open(bytes))
{
foreach (Page page in document.GetPages())
{
foreach (Word word in page.GetWords())
{
words.Add(word.Text);
}
}
}
string text = string.Join(" ", words);
Code using FreeSpire.PDF :
using Spire.Pdf;
using Spire.Pdf.Exporting.Text;
byte[] bytes;
using (HttpClient client = new())
{
bytes = client.GetByteArrayAsync("https://www.bcr.ro/content/dam/ro/bcr/www_bcr_ro/Aur/Cotatii_Aur.pdf").GetAwaiter().GetResult();
}
string text = string.Empty;
SimpleTextExtractionStrategy strategy = new();
using (PdfDocument doc = new())
{
doc.LoadFromBytes(bytes);
foreach (PdfPageBase page in doc.Pages)
{
text += page.ExtractText(strategy);
}
}
I am trying to return a PDF file from my ASP.NET Core 2 controller.
I have this code
(mostly borrowed from this SO question):
var net = new System.Net.WebClient();
//a random pdf file link
var fileLocation = "https://syntera.io/documents/T&C.pdf";/
var data = net.DownloadData(fileLocation);
MemoryStream content = null;
try
{
content = new MemoryStream(data);
return new FileStreamResult(content, "Application/octet-stream");
}
finally
{
content?.Dispose();
}
This code above is part of a service class that my controller calls. This is the code from my controller.
public async Task<IActionResult> DownloadFile(string fileName)
{
var result = await _downloader.DownloadFileAsync(fileName);
return result;
}
But I keep getting ObjectDisposedException: Cannot access a closed Stream.
The try and finally block was an attempt to fix it , from another SO question .
The main question is A) Is this the right way to send a PDF file back to the browser and B) if it isn't, how can I change the code to send the pdf to the browser?
Ideally , I don't want to first save the file on the server and then return it to the controller. I'd rather return it while keeping everything in memory.
The finally will always get called (even after the return) so it will always dispose of the content stream before it can be sent to the client, hence the error.
Ideally , I don't want to first save the file on the server and then return it to the controller. I'd rather return it while keeping everything in memory.
Use a FileContentResult class to take the raw byte array data and return it directly.
FileContentResult: Represents an ActionResult that when executed will write a binary file to the response.
async Task<IActionResult> DownloadFileAsync(string fileName){
using(var net = new System.Net.WebClient()) {
byte[] data = await net.DownloadDataTaskAsync(fileName);
return new FileContentResult(data, "application/pdf") {
FileDownloadName = "file_name_here.pdf"
};
}
}
No need for the additional memory stream
You must specify :
Response.AppendHeader("content-disposition", "inline; filename=file.pdf");
return new FileStreamResult(stream, "application/pdf")
For the file to be opened directly in the browser.
I have a Web Api controller method that gets passed document IDs and it should return the document files individually for those requested Ids. I have tried the accepted answer from the following link to achieve this functionality, but it's not working. I don't know where I did go wrong.
What's the best way to serve up multiple binary files from a single WebApi method?
My Web Api Method,
public async Task<HttpResponseMessage> DownloadMultiDocumentAsync(
IClaimedUser user, string documentId)
{
List<long> docIds = documentId.Split(',').Select(long.Parse).ToList();
List<Document> documentList = coreDataContext.Documents.Where(d => docIds.Contains(d.DocumentId) && d.IsActive).ToList();
var content = new MultipartContent();
CloudBlockBlob blob = null;
var container = GetBlobClient(tenantInfo);
var directory = container.GetDirectoryReference(
string.Format(DirectoryNameConfigValue, tenantInfo.TenantId.ToString(), documentList[0].ProjectId));
for (int docId = 0; docId < documentList.Count; docId++)
{
blob = directory.GetBlockBlobReference(DocumentNameConfigValue + documentList[docId].DocumentId);
if (!blob.Exists()) continue;
MemoryStream memStream = new MemoryStream();
await blob.DownloadToStreamAsync(memStream);
memStream.Seek(0, SeekOrigin.Begin);
var streamContent = new StreamContent(memStream);
content.Add(streamContent);
}
HttpResponseMessage httpResponseMessage = new HttpResponseMessage();
httpResponseMessage.Content = content;
httpResponseMessage.Content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
httpResponseMessage.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment");
httpResponseMessage.StatusCode = HttpStatusCode.OK;
return httpResponseMessage;
}
I tried with 2 or more document Ids but only one file was downloaded and that also is not in the correct format (without extension).
Zipping is the only option that will have consistent result on all browsers. MIME/multipart content is for email messages (https://en.wikipedia.org/wiki/MIME#Multipart_messages) and it was never intended to be received and parsed on the client side of a HTTP transaction. Some browsers do implement it, some others don't.
Alternatively, you can change your API to take in a single docId and iterate over your API from your client for each docId.
I think only way is that you zip your all the files and then download one zip file. I guess you can use dotnetzip package because it is easy to use.
One way is that, you can first save your files on disk and then stream the zip to download. Another way is, you can zip them in memory and then download the file in stream
public ActionResult Download()
{
using (ZipFile zip = new ZipFile())
{
zip.AddDirectory(Server.MapPath("~/Directories/hello"));
MemoryStream output = new MemoryStream();
zip.Save(output);
return File(output, "application/zip", "sample.zip");
}
}
I have a working solution to load and render a PDF document from a byte array in a Windows Store App. Lately some users have reported out-of-memory errors though. As you can see in the code below there is one stream I am not disposing of. I've commented out the line. If I do dispose of that stream, then the PDF document does not render anymore. It just shows a completely white image. Could anybody explain why and how I could load and render the PDF document and dispose of all disposables?
private static async Task<PdfDocument> LoadDocumentAsync(byte[] bytes)
{
using (var stream = new InMemoryRandomAccessStream())
{
await stream.WriteAsync(bytes.AsBuffer());
stream.Seek(0);
var fileStream = RandomAccessStreamReference.CreateFromStream(stream);
var inputStream = await fileStream.OpenReadAsync();
try
{
return await PdfDocument.LoadFromStreamAsync(inputStream);
}
finally
{
// do not dispose otherwise pdf does not load / render correctly. Not disposing though may cause memory issues.
// inputStream.Dispose();
}
}
}
and the code to render the PDF
private static async Task<ObservableCollection<BitmapImage>> RenderPagesAsync(
PdfDocument document,
PdfPageRenderOptions options)
{
var items = new ObservableCollection<BitmapImage>();
if (document != null && document.PageCount > 0)
{
for (var pageIndex = 0; pageIndex < document.PageCount; pageIndex++)
{
using (var page = document.GetPage((uint)pageIndex))
{
using (var imageStream = new InMemoryRandomAccessStream())
{
await page.RenderToStreamAsync(imageStream, options);
await imageStream.FlushAsync();
var renderStream = RandomAccessStreamReference.CreateFromStream(imageStream);
using (var stream = await renderStream.OpenReadAsync())
{
var bitmapImage = new BitmapImage();
await bitmapImage.SetSourceAsync(stream);
items.Add(bitmapImage);
}
}
}
}
}
return items;
}
As you can see I am using this RandomAccessStreamReference.CreateFromStream method in both of my methods. I've seen other examples that skip that step and use the InMemoryRandomAccessStream directly to load the PDF document or the bitmap image, but I've not managed to get the PDF to render correctly then. The images will just be completely white again. As I mentioned above, this code does actually render the PDF correctly, but does not dispose of all disposables.
Why
I assume LoadFromStreamAsync(IRandomAccessStream) does not parse the whole stream into the PdfDocument object but instead only parses the main PDF dictionaries and holds a reference to the IRandomAccessStream.
This actually is the sane thing to do, why parse the whole PDF into own objects (a possibly very expensive operation resource-wise) if the user eventually only wants to render one page, or even merely wants to query the number of pages...
Later on, when other methods of the returned PdfDocument are called, e.g. GetPage, these methods try to read the additional data from the stream they need for their task, e.g. for rendering. Unfortunately in your case that means after the finally { inputStream.Dispose(); }
How else
You have to postpone the inputStream.Dispose() until all operations on the PdfDocument are finished. That means some hopefully minor architectural changes for your code. Probably moving the LoadDocumentAsync code as a frame into the RenderPagesAsync method or its caller suffices.
I need to upload a file using Stream (Azure Blobstorage), and just cannot find out how to get the stream from the object itself. See code below.
I'm new to the WebAPI and have used some examples. I'm getting the files and filedata, but it's not correct type for my methods to upload it. Therefore, I need to get or convert it into a normal Stream, which seems a bit hard at the moment :)
I know I need to use ReadAsStreamAsync().Result in some way, but it crashes in the foreach loop since I'm getting two provider.Contents (first one seems right, second one does not).
[System.Web.Http.HttpPost]
public async Task<HttpResponseMessage> Upload()
{
if (!Request.Content.IsMimeMultipartContent())
{
this.Request.CreateResponse(HttpStatusCode.UnsupportedMediaType);
}
var provider = GetMultipartProvider();
var result = await Request.Content.ReadAsMultipartAsync(provider);
// On upload, files are given a generic name like "BodyPart_26d6abe1-3ae1-416a-9429-b35f15e6e5d5"
// so this is how you can get the original file name
var originalFileName = GetDeserializedFileName(result.FileData.First());
// uploadedFileInfo object will give you some additional stuff like file length,
// creation time, directory name, a few filesystem methods etc..
var uploadedFileInfo = new FileInfo(result.FileData.First().LocalFileName);
// Remove this line as well as GetFormData method if you're not
// sending any form data with your upload request
var fileUploadObj = GetFormData<UploadDataModel>(result);
Stream filestream = null;
using (Stream stream = new MemoryStream())
{
foreach (HttpContent content in provider.Contents)
{
BinaryFormatter bFormatter = new BinaryFormatter();
bFormatter.Serialize(stream, content.ReadAsStreamAsync().Result);
stream.Position = 0;
filestream = stream;
}
}
var storage = new StorageServices();
storage.UploadBlob(filestream, originalFileName);**strong text**
private MultipartFormDataStreamProvider GetMultipartProvider()
{
var uploadFolder = "~/App_Data/Tmp/FileUploads"; // you could put this to web.config
var root = HttpContext.Current.Server.MapPath(uploadFolder);
Directory.CreateDirectory(root);
return new MultipartFormDataStreamProvider(root);
}
This is identical to a dilemma I had a few months ago (capturing the upload stream before the MultipartStreamProvider took over and auto-magically saved the stream to a file). The recommendation was to inherit that class and override the methods ... but that didn't work in my case. :( (I wanted the functionality of both the MultipartFileStreamProvider and MultipartFormDataStreamProvider rolled into one MultipartStreamProvider, without the autosave part).
This might help; here's one written by one of the Web API developers, and this from the same developer.
Hi just wanted to post my answer so if anybody encounters the same issue they can find a solution here itself.
here
MultipartMemoryStreamProvider stream = await this.Request.Content.ReadAsMultipartAsync();
foreach (var st in stream.Contents)
{
var fileBytes = await st.ReadAsByteArrayAsync();
string base64 = Convert.ToBase64String(fileBytes);
var contentHeader = st.Headers;
string filename = contentHeader.ContentDisposition.FileName.Replace("\"", "");
string filetype = contentHeader.ContentType.MediaType;
}
I used MultipartMemoryStreamProvider and got all the details like filename and filetype from the header of content.
Hope this helps someone.