Create PDF from existing pdf with azure storage

Create PDF from existing pdf with azure storage - c#

I made a bot application with the Microsoft Botbuilder. Now I want to create a pdf-file from the user input. The file should be stored in my azure storage.
I have a "pdf-template" which should be copied and modified (this file is in the azure storage already). It has some textboxes which should be filled with the user input. I already wrote the code for that with iTextSharp.
But I need a filestream for this code. Does anybody know how to get the filestream from the file in my azure storage? Or is there maybe another way to finish my task?
Edit:
Here is the code where I need the filestream
string fileNameExisting = Path.Combine(Directory.GetCurrentDirectory(), "Some.pdf");
string fileNameNew = #"Path/Some2.pdf";
var inv = new Invention
{
Inventor = new Inventor { Firstname = "TEST!", Lastname= "TEST!" },
Date = DateTime.Now,
Title = "TEST",
Slogan = "TEST!",
Description = "TEST!",
Advantages = "TEST!s",
TaskPosition = "TEST!",
TaskSolution = "TEST!"
};
using (var existingFileStream = new FileStream(fileNameExisting, FileMode.Open))
using (var newFileStream = new FileStream(fileNameNew, FileMode.Create))
{
// Open existing PDF
var pdfReader = new PdfReader(existingFileStream);
// PdfStamper, which will create
var stamper = new PdfStamper(pdfReader, newFileStream);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
foreach (string fieldKey in fieldKeys)
{
var props = fieldKey.Split('.');
string t = GetProp(props, inv);
form.SetField(fieldKey, t);
}
stamper.Close();
pdfReader.Close();
}
}
public static string GetProp(string[] classes, object oldObj)
{
var obj = oldObj.GetType().GetProperty(classes[0]).GetValue(oldObj, null);
if(classes.Length>1)
{
classes = classes.Skip(1).ToArray();
return GetProp(classes, obj);
}
Console.WriteLine(obj.ToString());
return obj.ToString();
}

The PdfReader constructor also takes a byte array. You should be able to create the object using something like:
var pdfTemplateBytes = await new WebClient().DownloadDataTaskAsync("https://myaccount.blob.core.windows.net/templates/mytemplate.pdf");
var pdfReader = new PdfReader(pdfTemplateBytes );

Related

WordprocessingDocument doesnt save as expected in the memory stream

I am trying to make modifications in word document. For some reason the row that I added doesnt get saved in the memory. What I am doing wrong, no errors just the changes are not saved.
public void Generate()
{
byte[] templateDoc = File.ReadAllBytes(#"C:\Desktop\Test\11.docx");
using (MemoryStream stream = new MemoryStream())
{
stream.Write(templateDoc, 0, (int)templateDoc.Length);
using (var document1 = WordprocessingDocument.Open(stream, isEditable: true))
{
foreach (var item in document1.MainDocumentPart.Document.Body)
{
if (item.InnerText.Contains("<test>"))
{
DocumentFormat.OpenXml.Wordprocessing.Table table = new DocumentFormat.OpenXml.Wordprocessing.Table();
DocumentFormat.OpenXml.Wordprocessing.TableRow tr1 = new DocumentFormat.OpenXml.Wordprocessing.TableRow();
DocumentFormat.OpenXml.Wordprocessing.TableCell tc1 = new DocumentFormat.OpenXml.Wordprocessing.TableCell();
tc1.Append(new TableCellProperties(new TableCellWidth() { Type = TableWidthUnitValues.Pct, Width = "50" }));
tc1.Append(new DocumentFormat.OpenXml.Wordprocessing.Paragraph(new Run(new Text("Input 1:"))));
tr1.Append(tc1);
table.Append(tr1);
document1.MainDocumentPart.Document.Body.Append(table);
}
}
File.WriteAllBytes(#"C:\Desktop\Test\22.docx", stream.ToArray());
}
}
}

How to improve performance of CSV upload via datatable

I have a working solution for uploading a CSV file. Currently, I use the IFormCollection for a user to upload multiple CSV files from a view.
The CSV files are saved as a temp file as follows:
List<string> fileLocations = new List<string>();
foreach (var formFile in files)
{
filePath = Path.GetTempFileName();
if (formFile.Length > 0)
{
using (var stream = new FileStream(filePath, FileMode.Create))
{
await formFile.CopyToAsync(stream);
}
}
fileLocations.Add(filePath);
}
I send the list of file locations to another method (just below). I loop through the file locations and stream the data from the temp files, I then use a data table and SqlBulkCopyto insert the data. I currently upload between 50 and 200 files at a time and each file is around 330KB. To insert a hundred, it takes around 6 minutes, which is around 30-35MB.
public void SplitCsvData(string fileLocation, Guid uid)
{
MetaDataModel MetaDatas;
List<RawDataModel> RawDatas;
var reader = new StreamReader(File.OpenRead(fileLocation));
List<string> listRows = new List<string>();
while (!reader.EndOfStream)
{
listRows.Add(reader.ReadLine());
}
var metaData = new List<string>();
var rawData = new List<string>();
foreach (var row in listRows)
{
var rowName = row.Split(',')[0];
bool parsed = int.TryParse(rowName, out int result);
if (parsed == false)
{
metaData.Add(row);
}
else
{
rawData.Add(row);
}
}
//Assigns the vertical header name and value to the object by splitting string
RawDatas = GetRawData.SplitRawData(rawData);
SaveRawData(RawDatas);
MetaDatas = GetMetaData.SplitRawData(rawData);
SaveRawData(RawDatas);
}
This code then passes the object to the to create the datatable and insert the data.
private DataTable CreateRawDataTable
{
get
{
var dt = new DataTable();
dt.Columns.Add("Id", typeof(int));
dt.Columns.Add("SerialNumber", typeof(string));
dt.Columns.Add("ReadingNumber", typeof(int));
dt.Columns.Add("ReadingDate", typeof(string));
dt.Columns.Add("ReadingTime", typeof(string));
dt.Columns.Add("RunTime", typeof(string));
dt.Columns.Add("Temperature", typeof(double));
dt.Columns.Add("ProjectGuid", typeof(Guid));
dt.Columns.Add("CombineDateTime", typeof(string));
return dt;
}
}
public void SaveRawData(List<RawDataModel> data)
{
DataTable dt = CreateRawDataTable;
var count = data.Count;
for (var i = 1; i < count; i++)
{
DataRow row = dt.NewRow();
row["Id"] = data[i].Id;
row["ProjectGuid"] = data[i].ProjectGuid;
row["SerialNumber"] = data[i].SerialNumber;
row["ReadingNumber"] = data[i].ReadingNumber;
row["ReadingDate"] = data[i].ReadingDate;
row["ReadingTime"] = data[i].ReadingTime;
row["CombineDateTime"] = data[i].CombineDateTime;
row["RunTime"] = data[i].RunTime;
row["Temperature"] = data[i].Temperature;
dt.Rows.Add(row);
}
using (var conn = new SqlConnection(connectionString))
{
conn.Open();
using (SqlTransaction tr = conn.BeginTransaction())
{
using (var sqlBulk = new SqlBulkCopy(conn, SqlBulkCopyOptions.Default, tr))
{
sqlBulk.BatchSize = 1000;
sqlBulk.DestinationTableName = "RawData";
sqlBulk.WriteToServer(dt);
}
tr.Commit();
}
}
}
Is there another way to do this or a better way to improve performance so that the time to upload is reduced as it can take a long time and I am seeing an ever increasing use of memory to around 500MB.
TIA

You can improve performance by removing the DataTable and reading from the input stream directly.
SqlBulkCopy has a WriteToServer overload that accepts an IDataReader instead of an entire DataTable.
CsvHelper can CSV files using a StreamReader as an input. It provides CsvDataReader as an IDataReader implementation on top of the CSV data. This allows reading directly from the input stream and writing to SqlBulkCopy.
The following method will read from an IFormFile, parse the stream using CsvHelper and use the CSV's fields to configure a SqlBulkCopy instance :
public async Task ToTable(IFormFile file, string table)
{
using (var stream = file.OpenReadStream())
using (var tx = new StreamReader(stream))
using (var reader = new CsvReader(tx))
using (var rd = new CsvDataReader(reader))
{
var headers = reader.Context.HeaderRecord;
var bcp = new SqlBulkCopy(_connection)
{
DestinationTableName = table
};
//Assume the file headers and table fields have the same names
foreach(var header in headers)
{
bcp.ColumnMappings.Add(header, header);
}
await bcp.WriteToServerAsync(rd);
}
}
This way nothing is ever written to a temp table or cached in memory. The uploaded files are parsed and written to the database directly.

In addition to #Panagiotis's answer, why don't you interleave your file processing with the file upload? Wrap up your file processing logic in an async method and change the loop to a Parallel.Foreach and process each file as it arrives instead of waiting for all of them?
private static readonly object listLock = new Object(); // only once at class level
List<string> fileLocations = new List<string>();
Parallel.ForEach(files, (formFile) =>
{
filePath = Path.GetTempFileName();
if (formFile.Length > 0)
{
using (var stream = new FileStream(filePath, FileMode.Create))
{
await formFile.CopyToAsync(stream);
}
await ProcessFileInToDbAsync(filePath);
}
// Added lock for thread safety of the List
lock (listLock)
{
fileLocations.Add(filePath);
}
});

Thanks to #Panagiotis Kanavos, I was able to work out what to do. Firstly, the way I was calling the methods, was leaving them in memory. The CSV file I have is in two parts, vertical metadata and then the usual horizontal information. So I needed to split them into two. Saving them as tmp files was also causing an overhead. It has gone from taking 5-6 minutes to now taking a minute, which for a 100 files containing 8,500 rows isn't bad I suppose.
Calling the method:
public async Task<IActionResult> UploadCsvFiles(ICollection<IFormFile> files, IFormCollection fc)
{
foreach (var f in files)
{
var getData = new GetData(_configuration);
await getData.SplitCsvData(f, uid);
}
return whatever;
}
This is the method doing the splitting:
public async Task SplitCsvData(IFormFile file, string uid)
{
var data = string.Empty;
var m = new List<string>();
var r = new List<string>();
var records = new List<string>();
using (var stream = file.OpenReadStream())
using (var reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var header = line.Split(',')[0].ToString();
bool parsed = int.TryParse(header, out int result);
if (!parsed)
{
m.Add(line);
}
else
{
r.Add(line);
}
}
}
//TODO: Validation
//This splits the list into the Meta data model. This is just a single object, with static fields.
var metaData = SplitCsvMetaData.SplitMetaData(m, uid);
DataTable dtm = CreateMetaData(metaData);
var serialNumber = metaData.LoggerId;
await SaveMetaData("MetaData", dtm);
//
var lrd = new List<RawDataModel>();
foreach (string row in r)
{
lrd.Add(new RawDataModel
{
Id = 0,
SerialNumber = serialNumber,
ReadingNumber = Convert.ToInt32(row.Split(',')[0]),
ReadingDate = Convert.ToDateTime(row.Split(',')[1]).ToString("yyyy-MM-dd"),
ReadingTime = Convert.ToDateTime(row.Split(',')[2]).ToString("HH:mm:ss"),
RunTime = row.Split(',')[3].ToString(),
Temperature = Convert.ToDouble(row.Split(',')[4]),
ProjectGuid = uid.ToString(),
CombineDateTime = Convert.ToDateTime(row.Split(',')[1] + " " + row.Split(',')[2]).ToString("yyyy-MM-dd HH:mm:ss")
});
}
await SaveRawData("RawData", lrd);
}
I then use a data table for the metadata (which takes 20 seconds for a 100 files) as I map the field names to the columns.
public async Task SaveMetaData(string table, DataTable dt)
{
using (SqlBulkCopy sqlBulk = new SqlBulkCopy(_configuration.GetConnectionString("DefaultConnection"), SqlBulkCopyOptions.Default))
{
sqlBulk.DestinationTableName = table;
await sqlBulk.WriteToServerAsync(dt);
}
}
I then use FastMember for the large data parts for the raw data, which is more like a traditional CSV.
public async Task SaveRawData(string table, IEnumerable<LogTagRawDataModel> lrd)
{
using (SqlBulkCopy sqlBulk = new SqlBulkCopy(_configuration.GetConnectionString("DefaultConnection"), SqlBulkCopyOptions.Default))
using (var reader = ObjectReader.Create(lrd, "Id","SerialNumber", "ReadingNumber", "ReadingDate", "ReadingTime", "RunTime", "Temperature", "ProjectGuid", "CombineDateTime"))
{
sqlBulk.DestinationTableName = table;
await sqlBulk.WriteToServerAsync(reader);
}
}
I am sure this can be improved on, but for now, this works really well.

Set programatically created ReportBook as HTML5 ReportSource

A user can select multiple orders, and download all the reports as one PDF.
We used PdfSmartCopy to merge the reports:
protected void Print(int[] order_ids)
{
byte[] merged_reports;
using (MemoryStream ms = new MemoryStream())
using (Document doc = new Document())
using (PdfSmartCopy copy = new PdfSmartCopy(doc, ms))
{
doc.Open();
foreach (string order_id in order_ids)
{
Telerik.Reporting.InstanceReportSource reportSource = new Telerik.Reporting.InstanceReportSource();
reportSource.ReportDocument = new OrderReport();
reportSource.Parameters.Add(new Telerik.Reporting.Parameter("order_id", order_id));
RenderingResult result = new ReportProcessor().RenderReport("PDF", reportSource, new Hashtable());
using (PdfReader reader = new PdfReader(result.DocumentBytes))
{
copy.AddDocument(reader);
}
}
doc.Close();
merged_reports = ms.ToArray();
}
Response.Clear();
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.Expires = -1;
Response.Buffer = false;
Response.ContentType = "application/pdf";
Response.OutputStream.Write(merged_reports, 0, merged_reports.Length);
}
But we started using the HTML5 ReportViewer elsewhere and we want to use it there as well to be consistent. I thought of creating a ReportBook programmatically and set it as the ReportSource of the ReportViewer, but the only thing I can set is a string. We have already used ReportBook before, but this was an actual SomeReportBook.cs that we could set through new SomeReportBook().GetType().AssemblyQualifiedName;.
Any clue? Here is what I have at the moment:
protected void Print(int[] order_ids)
{
Telerik.Reporting.ReportBook reportBook = new Telerik.Reporting.ReportBook();
foreach (string order_id in order_ids)
{
Telerik.Reporting.InstanceReportSource reportSource = new Telerik.Reporting.InstanceReportSource();
reportSource.ReportDocument = new OrderReport();
reportSource.Parameters.Add(new Telerik.Reporting.Parameter("order_id", order_id));
reportBook.ReportSources.Add(reportSource);
}
this.ReportViewer.ReportSource = new Telerik.ReportViewer.Html5.WebForms.ReportSource()
{
Identifier = // Can't use reportBook.GetType().AssemblyQualifiedName
};
}

I have also struggled with this challenge for quite some time; I would to share in case
someone else faces such a challenge. Kindly do this.
1.Create a class that inherits from - Telerik.Reporting.ReportBook
2.Create a method that loads all your reports in your reportbook class i.e.
this.ReportSources.Add(new TypeReportSource
{
TypeName = typeof(Report1).AssemblyQualifiedName
});
Call you method in your class constructor
use the following code to set the report viewer source
var reportSource = new Telerik.ReportViewer.Html5.WebForms.ReportSource();
reportSource.IdentifierType = IdentifierType.TypeReportSource;
reportSource.Identifier = typeof(ReportCatalog).AssemblyQualifiedName;//or
namespace.class, assembly e.g. "MyReports.Report1, MyReportsLibrary"
reportSource.Parameters.Add("Parameter1", "Parameter1");
reportSource.Parameters.Add("Parameter2", "Parameter2");
ReportsViewer1.ReportSource = reportSource;
Report1 = Newly created class that inherits from Telerik.Reporting.ReportBook

Inserting json documents in DocumentDB

In DocumentDB documentation examples, I find insertion of C# objects.
// Create the Andersen family document.
Family AndersenFamily = new Family
{
Id = "AndersenFamily",
LastName = "Andersen",
Parents = new Parent[] {
new Parent { FirstName = "Thomas" },
new Parent { FirstName = "Mary Kay"}
},
IsRegistered = true
};
await client.CreateDocumentAsync(documentCollection.DocumentsLink, AndersenFamily);
In my case, I'm receiving json strings from application client and would like to insert them in DocumentDB without deserializing them. Could not find any examples of doing something similar.
Any help is sincerely appreciated..
Thanks

Copied from the published .NET Sample code -
private static async Task UseStreams(string colSelfLink)
{
var dir = new DirectoryInfo(#".\Data");
var files = dir.EnumerateFiles("*.json");
foreach (var file in files)
{
using (var fileStream = new FileStream(file.FullName, FileMode.Open, FileAccess.Read))
{
Document doc = await client.CreateDocumentAsync(colSelfLink, Resource.LoadFrom<Document>(fileStream));
Console.WriteLine("Created Document: ", doc);
}
}
//Read one the documents created above directly in to a Json string
Document readDoc = client.CreateDocumentQuery(colSelfLink).Where(d => d.Id == "JSON1").AsEnumerable().First();
string content = JsonConvert.SerializeObject(readDoc);
//Update a document with some Json text,
//Here we're replacing a previously created document with some new text and even introudcing a new Property, Status=Cancelled
using (var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes("{\"id\": \"JSON1\",\"PurchaseOrderNumber\": \"PO18009186470\",\"Status\": \"Cancelled\"}")))
{
await client.ReplaceDocumentAsync(readDoc.SelfLink, Resource.LoadFrom<Document>(memoryStream));
}
}

copying openXML image from one document to another

We have conditional Footers that INCLUDETEXT based on the client:
IF $CLIENT = "CLIENT1" "{INCLUDETEXT "CLIENT1HEADER.DOCX"}" ""
Depending on our document, there could be a varying amount of IF/ELSE, and these all work correctly for merging the correct files in the correct place.
However, some of these documents may have client specific images/branding, which also need to be copied across from the INCLUDETEXT file.
Below is the method that is called to replace any Picture elements that exist in the IEnumerable<Run> that is copied from the Source document to the Target document.
The image is copied fine, however it doesn't appear to update the RID in my Picture or add a record into the .XML.Rels files. (I even tried adding a ForEach to add to all the headers and footers, to see if this made any difference.
private void InsertImagesFromOldDocToNewDoc(WordprocessingDocument source, WordprocessingDocument target, IEnumerable<Picture> pics)
{
IEnumerable<Picture> imageElements = source.MainDocumentPart.Document.Descendants<Run>().Where(x => x.Descendants<Picture>().FirstOrDefault() != null).Select(x => x.Descendants<Picture>().FirstOrDefault());
foreach (Picture pic in pics) //the new pics
{
Picture oldPic = imageElements.Where(x => x.Equals(pic)).FirstOrDefault();
if (oldPic != null)
{
string imageId = "";
ImageData shape = oldPic.Descendants<ImageData>().FirstOrDefault();
ImagePart p = source.MainDocumentPart.GetPartById(shape.RelationshipId) as ImagePart;
ImagePart newPart = target.MainDocumentPart.AddPart<ImagePart>(p);
newPart.FeedData(p.GetStream());
shape.RelId = target.MainDocumentPart.GetIdOfPart(newPart);
string relPart = target.MainDocumentPart.CreateRelationshipToPart(newPart);
}
}
}
Has anyone come across this issue before?
It appears the OpenXML SDK documentation is a 'little' sparse...

Late reaction but this thread helped me a lot to got it working. Here my solution for copying a document with images
private static void CopyDocumentWithImages(string path)
{
if (!Path.GetFileName(path).StartsWith("~$"))
{
using (var source = WordprocessingDocument.Open(path, false))
{
using (var newDoc = source.CreateNew(path.Replace(".docx", "-images.docx")))
{
foreach (var e in source.MainDocumentPart.Document.Body.Elements())
{
var clonedElement = e.CloneNode(true);
clonedElement.Descendants<DocumentFormat.OpenXml.Drawing.Blip>()
.ToList().ForEach(blip =>
{
var newRelation = newDoc.CopyImage(blip.Embed, source);
blip.Embed = newRelation;
});
clonedElement.Descendants<DocumentFormat.OpenXml.Vml.ImageData>().ToList().ForEach(imageData =>
{
var newRelation = newDoc.CopyImage(imageData.RelationshipId, source);
imageData.RelationshipId = newRelation;
});
newDoc.MainDocumentPart.Document.Body.AppendChild(clonedElement);
}
newDoc.Save();
}
}
}
}
CopyImage:
public static string CopyImage(this WordprocessingDocument newDoc, string relId, WordprocessingDocument org)
{
var p = org.MainDocumentPart.GetPartById(relId) as ImagePart;
var newPart = newDoc.MainDocumentPart.AddPart(p);
newPart.FeedData(p.GetStream());
return newDoc.MainDocumentPart.GetIdOfPart(newPart);
}
CreateNew:
public static WordprocessingDocument CreateNew(this WordprocessingDocument org, string name)
{
var doc = WordprocessingDocument.Create(name, WordprocessingDocumentType.Document);
doc.AddMainDocumentPart();
doc.MainDocumentPart.Document = new Document(new Body());
using (var streamReader = new StreamReader(org.MainDocumentPart.ThemePart.GetStream()))
using (var streamWriter = new StreamWriter(doc.MainDocumentPart.AddNewPart<ThemePart>().GetStream(FileMode.Create)))
{
streamWriter.Write(streamReader.ReadToEnd());
}
using (var streamReader = new StreamReader(org.MainDocumentPart.StyleDefinitionsPart.GetStream()))
using (var streamWriter = new StreamWriter(doc.MainDocumentPart.AddNewPart<StyleDefinitionsPart>().GetStream(FileMode.Create)))
{
streamWriter.Write(streamReader.ReadToEnd());
}
return doc;
}

Stuart,
I had faced the same problem when I was trying to copy the numbering styles from one document to the other.
I think what Word does internally is, whenever an object is copied from one document to the other the ID for that object is not copied over to the new document and instead what happens is a new ID is assigned to it.
You'll have to get the ID after the image has been copied and then replace it everywhere your image has been used.
I hope this helps, this is what I to use copy numbering styles.
Cheers

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Create PDF from existing pdf with azure storage - c#

The PdfReader constructor also takes a byte array. You should be able to create the object using something like: var pdfTemplateBytes = await new WebClient().DownloadDataTaskAsync("https://myaccount.blob.core.windows.net/templates/mytemplate.pdf"); var pdfReader = new PdfReader(pdfTemplateBytes );

Related

WordprocessingDocument doesnt save as expected in the memory stream

How to improve performance of CSV upload via datatable

Set programatically created ReportBook as HTML5 ReportSource

Inserting json documents in DocumentDB

copying openXML image from one document to another

Categories

Resources