I have a form in ASP.NET and in when I fill up the form in the last step it generates a PDF file. I used jsPDF.
What I want is that, the generated pdf file to be send (saved) in Azure storage, does anyone can help me?
Thank you
UPDATE: This is the code that I'm trying, it's working but it's extracting only the text, it doesn't save the pdf as it is:
var account = new CloudStorageAccount(new Microsoft.WindowsAzure.Storage.Auth.StorageCredentials("storageaccount",
"accesskey"), true);
var blobClient = account.CreateCloudBlobClient();
var container = blobClient.GetContainerReference("folderpath");
StringBuilder text = new StringBuilder();
string filePath = "C:\\Users\\username\\Desktop\\toPDF\\testing PDFs\\test.pdf";
if (File.Exists(filePath))
{
PdfReader pdfReader = new PdfReader(filePath);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
}
pdfReader.Close();
using (MemoryStream ms = new MemoryStream())
{
using (var doc = new iTextSharp.text.Document())
{
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
doc.Open();
doc.Add(new Paragraph(text.ToString()));
}
var byteArray = ms.ToArray();
var blobName = "test.pdf";
var blob = container.GetBlockBlobReference(blobName);
blob.Properties.ContentType = "application/pdf";
blob.UploadFromByteArray(byteArray, 0, byteArray.Length);
}
}
I found a simple solution, this is what the code does:
string filePath = "C:\\Users\\username\\Desktop\\toPDF\\testing PDFs\\rpa.pdf";
var credentials = new StorageCredentials("storageaccount","accesskey");
var client = new CloudBlobClient(new Uri("https://jpllanatest.blob.core.windows.net/"), credentials);
// Retrieve a reference to a container. (You need to create one using the mangement portal, or call container.CreateIfNotExists())
var container = client.GetContainerReference("folderpath");
// Retrieve reference to a blob named "myfile.pdf".
var blockBlob = container.GetBlockBlobReference("myfile.pdf");
// Create or overwrite the "myblob" blob with contents from a local file.
using (var fileStream = System.IO.File.OpenRead(filePath))
{
blockBlob.UploadFromStream(fileStream);
}
Click on your solution in Visual Studio, then Add => Add Connected Service => Select Azure Storage, then go through the wizard (if you need, create the storage account - the wizard has that option) and, after that, your solution will be configured with all needed settings (connection string included) and VS will open the page with the detailed tutorial on how to use Azure Storage in your browser. As it has the information and needed code pieces, i willnot include it here (likely, it will change in a future, to avoid the deprecated information).
Tutorial about Add Connected Service => Azure Storage functionality.
Related
I'm saving the contents of my PDF file (pdfAsString) to the database.
File is of type IFormFile (file uploaded by the user).
string pdfAsString;
using (var reader = new StreamReader(indexModel.UploadModel.File.OpenReadStream()))
{
pdfAsString = await reader.ReadToEndAsync();
// pdfAsString = ; // encoding function or lack thereof
}
Later I'm trying to fetch and use these contents to initialize a new instance of MemoryStream, then using that to create a PdfReader and then using that to create a PdfDocument, but at this point I get the 'Trailer not found' exception. I have verified that the Trailer part of the PDF is present inside the contents of the string that I use to create the MemoryStream. I have also made sure the position is set to the beginning of the file.
The issue seems related to the format of the PDF contents fetched from the database. iText7 doesn't seem able to navigate through it other than the beginning of the file.
I'm expecting to be able to create an instance of PdfDocument with the contents of the PDF saved to my database.
Note 1: Using the Stream created from OpenReadStream() works when trying to create a PdfReader and then PdfDocument, but I don't have access to that IFormFile when reading from the DB, so this doesn't help me in my use case.
Note 2: If I use the PDF from my device by giving a path, it works correctly, same for using a FileStream created from a path. However, this doesn't help my use case.
So far, I've tried saving it raw and then using that right out of the gate (1) or encoding special symbols like \n \t to ASCII hexadecimal notation (2). I've also tried HttpUtility.UrlEncode on save and UrlDecode after getting the database record (3), and also tried ToBase64String on save and FromBase64String on get (4).
// var pdfContent = databaseString; // 1
// var pdfContent = databaseString.EncodeSpecialCharacters(); // encode special symbols // 2
// var pdfContent = HttpUtility.UrlDecode(databaseString); // decode urlencoded string // 3
var pdfContent = Convert.FromBase64String(databaseString); // decode base64 // 4
using (var stream = new MemoryStream(pdfContent))
{
PdfReader pdfReader = new PdfReader(stream).SetUnethicalReading(true);
PdfWriter pdfWriter = new PdfWriter("new-file.pdf");
PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter); // exception here :(
// some business logic...
}
Any help would be appreciated.
EDIT: on a separate project, I'm trying to run this code:
using (var stream = File.OpenRead("C:\\<path>\\<filename>.pdf"))
{
var formFile = new FormFile(stream, 0, stream.Length, null, "<filename>.pdf");
var reader = new StreamReader(formFile.OpenReadStream());
var pdfAsString = reader.ReadToEnd();
var pdfAsBytes = Encoding.UTF8.GetBytes(pdfAsString);
using (var newStream = new MemoryStream(pdfAsBytes))
{
newStream.Seek(0, SeekOrigin.Begin);
var pdfReader = new PdfReader(newStream).SetUnethicalReading(true);
var pdfWriter = new PdfWriter("Test-PDF-1.pdf");
PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter);
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<string, PdfFormField> fields = form.GetFormFields();
foreach (var field in fields)
{
field.Value.SetValue(field.Key);
}
//form.FlattenFields();
pdf.Close();
}
}
and if I replace "newStream" inside of PdfReader with formFile.OpenReadStream() it works fine, otherwise I get the 'Trailer not found' exception.
Answer: use BinaryReader and ReadBytes instead of StreamReader when initially trying to read the data. Example below:
using (var stream = File.OpenRead("C:\\<filepath>\\<filename>.pdf"))
{
// FormFile - my starting point inside of the web application
var formFile = new FormFile(stream, 0, stream.Length, null, "<filename>.pdf");
var reader = new BinaryReader(formFile.OpenReadStream());
var pdfAsBytes = reader.ReadBytes((int)formFile.Length); // store this in the database
using (var newStream = new MemoryStream(pdfAsBytes))
{
newStream.Seek(0, SeekOrigin.Begin);
var pdfReader = new PdfReader(newStream).SetUnethicalReading(true);
var pdfWriter = new PdfWriter("Test-PDF-1.pdf");
PdfDocument pdf = new PdfDocument(pdfReader, pdfWriter);
PdfAcroForm form = PdfAcroForm.GetAcroForm(pdf, true);
IDictionary<string, PdfFormField> fields = form.GetFormFields();
foreach (var field in fields)
{
field.Value.SetValue(field.Key);
}
//form.FlattenFields();
pdf.Close();
}
}
I'm working on a Blazor WASM App and I want my users to easily open pdf files on specific pages that contain additional information.
I cannot distribute those files myself or upload them to any kind of server. Each user has to provide them themselves.
Because the files are up to 60MB big I cannot convert the uploaded file to base64 and display them as described here.
However I don't have to display the whole file and could just load the needed page +- some pages around them.
For that I tried using iText7 ExtractPageRange(). This answer indicates, that I have to override the GetNextPdfWriter() Method and to store all streams in an collection.
class ByteArrayPdfSplitter : PdfSplitter {
public ByteArrayPdfSplitter(PdfDocument pdfDocument) : base(pdfDocument) {
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange) {
CurrentMemoryStream = new MemoryStream();
UsedStreams.Add(CurrentMemoryStream);
return new PdfWriter(CurrentMemoryStream);
}
public MemoryStream CurrentMemoryStream { get; private set; }
public List<MemoryStream> UsedStreams { get; set; } = new List<MemoryStream>();
Then I thought I could merge those streams and convert them to base64
var file = loadedFiles.First();
using (MemoryStream ms = new MemoryStream())
{
var rs = file.OpenReadStream(maxFileSize);
await rs.CopyToAsync(ms);
ms.Position = 0;
//rs needed to be converted to ms, because the PdfReader constructer uses a
//synchronious read that isn't supported by rs and throws an exception.
PdfReader pdfReader = new PdfReader(ms);
var document = new PdfDocument(pdfReader);
var splitter = new ByteArrayPdfSplitter(document);
var range = new PageRange();
range.AddPageSequence(1, 10);
var splitDoc = splitter.ExtractPageRange(range);
//Edit commented this out, shouldn't have been here at all leads to an exception
//splitDoc.Close();
var outputMs = new MemoryStream();
foreach (var usedMs in splitter.UsedStreams)
{
usedMs.Position = 0;
outputMs.Position = outputMs.Length;
await usedMs.CopyToAsync(outputMs);
}
var data = outputMs.ToArray();
currentPdfContent = "data:application/pdf;base64,";
currentPdfContent += Convert.ToBase64String(data);
pdfLoaded = true;
}
This however doesn't work.
Has anyone a suggestion how to get this working? Or maybe a simpler solution I could try.
Edit:
I took a closer look in debug and it seems like, the resulting stream outputMs is always empty. So it is probably a problem in how I split the pdf.
After at least partially clearing up my misconception of what it means to not being able to access the file system from blazor WASM I managed to find a working solution.
await using MemoryStream ms = new MemoryStream();
var rs = file.OpenReadStream(maxFileSize);
await using var fs = new FileStream("test.pdf", FileMode.Create)
fs.Position = 0;
await rs.CopyToAsync(fs);
fs.Close();
string path = "test.pdf";
string range = "10 - 15";
var pdfDocument = new PdfDocument(new PdfReader("test.pdf"));
var split = new MySplitter(pdfDocument);
var result = split.ExtractPageRange(new PageRange(range));
result.Close();
await using var splitFs = new FileStream("split.pdf", FileMode.Open))
await splitFs.CopyToAsync(ms);
var data = ms.ToArray();
var pdfContent = "data:application/pdf;base64,";
pdfContent += System.Convert.ToBase64String(data);
Console.WriteLine(pdfContent);
currentPdfContent = pdfContent;
With the MySplitter Class from this answer.
class MySplitter : PdfSplitter
{
public MySplitter(PdfDocument pdfDocument) : base(pdfDocument)
{
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange)
{
String toFile = "split.pdf";
return new PdfWriter(toFile);
}
}
I have a web API which receives multiple files which are images. At the moment I save the image in my local disk and then perform methods on it. And then upload the local copy of it to Azure storage. What I want to be able to do is -
1) Get the image directly (if possible and then perform methods)
2) Save the pdf directly to Azure storage.
My code looks like this.
public static class Imageupload
{
[FunctionName("Imageupload")]
public static async System.Threading.Tasks.Task<HttpResponseMessage> RunAsync([HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = "HttpTriggerCSharp/name/{name}")]HttpRequestMessage req, string name, TraceWriter log)
{
//Check if the request contains multipart/form-data.
if (!req.Content.IsMimeMultipartContent())
{
return req.CreateResponse(HttpStatusCode.UnsupportedMediaType);
}
var storageConnectionString = "XXXXXXXXXXXXXXXXXXXXXXXXXX";
var storageAccount = CloudStorageAccount.Parse(storageConnectionString);
log.Info(storageConnectionString);
var blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve a reference to a container.
CloudBlobContainer container = blobClient.GetContainerReference("temporary-images");
// Create the container if it doesn't already exist.
container.CreateIfNotExists();
//Retrieve reference to a blob named "myblob".
CloudBlockBlob blockBlob = container.GetBlockBlobReference("images");
//The root path where the content of MIME multipart body parts are written to
var provider = new MultipartFormDataStreamProvider(#"C:\Users\Al\Desktop\Images\");
var s = blockBlob.Uri.AbsoluteUri;
await req.Content.ReadAsMultipartAsync();
//Test function for aspose
// Instantiate Document object
var pdf = new Aspose.Pdf.Document();
//Add a page to the document
var pdfImageSection = pdf.Pages.Add();
List<UploadedFileInfo> files = new List<UploadedFileInfo>();
// This illustrates how to get the file names.
foreach (MultipartFileData file in provider.FileData)
{
var fileInfo = new FileInfo(file.Headers.ContentDisposition.FileName.Trim('"'));
files.Add(new UploadedFileInfo()
{
FileName = fileInfo.Name,
ContentType = file.Headers.ContentType.MediaType,
FileExtension = fileInfo.Extension,
FileURL = file.LocalFileName
});
//Iterate through multiple images
FileStream stream = new FileStream(file.LocalFileName, FileMode.Open);
System.Drawing.Image img = new System.Drawing.Bitmap(stream);
var image = new Aspose.Pdf.Image { ImageStream = stream };
//Set appearance properties
image.FixHeight = 300;
image.FixWidth = 300;
//Set margins for proper spacing and alignment
image.Margin = new MarginInfo(5, 10, 5, 10);
//Add the image to paragraphs of the document
pdfImageSection.Paragraphs.Add(image);
}
//Save resultant document
pdf.Save(#"C:\Users\Al\Desktop\Images\Image2Pdf_out.pdf");
var ss = pdf.FileName;
blockBlob.UploadFromFile(#"C:\Users\Al\Desktop\Images\Image2Pdf_out.pdf");
return req.CreateResponse(HttpStatusCode.OK, files, "application /json");
//return req.CreateResponse(HttpStatusCode.OK,"Its working","application/json");
}
Now this works when I am saving this in my local disk and upload to Azure. But saving it in local disk wont work after I publish the function to Azure portal.
So I want to get the images, and then add them into a PDF and upload to Azure storage. How can I do this?
I have a PDF which is generated by iTextSharp in C# - it's a template PDF, which gets some additional lines of text added using stamper, then pushed to S3 and finally returned to the browser as a file stream (using mvc.net).
The newly added lines work fine when the PDF is viewed in the browser (Chrome), but when I download the PDF and open it locally (with Preview or Adobe Acrobat on Mac), only the template is showing, and the newly added lines are gone.
What could cause this?
Here's a code example: (condensed)
using(var receiptTemplateStream = GetType().Assembly.GetManifestResourceStream("XXXXX.DepositReceipts.Receipt.pdf" ))
{
var reader = new PdfReader(receiptTemplateStream);
var outputPdfStream = new MemoryStream();
var stamper = new PdfStamper(reader, outputPdfStream) { FormFlattening = true, FreeTextFlattening = true };
var _pbover = stamper.GetOverContent(1);
using (var latoLightStream = GetType().Assembly.GetManifestResourceStream("XXXXX.DepositReceipts.Fonts.Lato-Light.ttf"))
using (var latoLightMS = new MemoryStream())
{
_pbover.SetFontAndSize(latoLight, 11.0f);
var verticalPosition = 650;
_pbover.ShowTextAligned(0, account.company_name, 45, verticalPosition, 0);
verticalPosition = verticalPosition - 15;
var filename = "Receipt 0001.pdf";
stamper.SetFullCompression();
stamper.Close();
var file = outputPdfStream.ToArray();
using (var output = new MemoryStream())
{
output.Write(file, 0, file.Length);
output.Position = 0;
var response = await _s3Client.PutObjectAsync(new PutObjectRequest()
{
InputStream = output,
BucketName = "XXXX",
CannedACL = S3CannedACL.Private,
Key = filename
});
}
return filename;
}
}
This was a funky one!
I had another method in the same solution that worked without problems. It turns out that the template PDF, that I load and write my content over, was the issue.
The template I used was generated in Adobe Illustrator. I had another one which was generated in Adobe Indesign - which worked.
When I pulled this template pdf into Indesign, and then exported it again (from Indesign) it suddenly worked.
I'm not sure exactly what caused this issue, but it must be some sort of encoding.
So I'm using the fancy EPPlus library to write an Excel file and output it to the user to download. For the following method I'm just using some test data to minimize on the code, then I'll add the code I'm using to connect to database later. Now I can download a file all fine, but when I go to open the file, Excel complains that it's not a valid file and might be corrupted. When I go to look at the file, it says it's 0KB big. So my question is, where am I going wrong? I'm assuming it's with the MemoryStream. Haven't done much work with streams before so I'm not exactly sure what to use here. Any help would be appreciated!
[Authorize]
public ActionResult Download_PERS936AB()
{
ExcelPackage pck = new ExcelPackage();
var ws = pck.Workbook.Worksheets.Add("Sample1");
ws.Cells["A1"].Value = "Sample 1";
ws.Cells["A1"].Style.Font.Bold = true;
var shape = ws.Drawings.AddShape("Shape1", eShapeStyle.Rect);
shape.SetPosition(50, 200);
shape.SetSize(200, 100);
shape.Text = "Sample 1 text text text";
var memorystream = new MemoryStream();
pck.SaveAs(memorystream);
return new FileStreamResult(memorystream, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet") { FileDownloadName = "PERS936AB.xlsx" };
}
Here's what I'm using - I've been using this for several months now and haven't had an issue:
public ActionResult ChargeSummaryData(ChargeSummaryRptParams rptParams)
{
var fileDownloadName = "sample.xlsx";
var contentType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
var package = CreatePivotTable(rptParams);
var fileStream = new MemoryStream();
package.SaveAs(fileStream);
fileStream.Position = 0;
var fsr = new FileStreamResult(fileStream, contentType);
fsr.FileDownloadName = fileDownloadName;
return fsr;
}
One thing I noticed right off the bat is that you don't reset your file stream position back to 0.