combining byte arrays to pdf - c#

I've a byte array which I get from an API.
byte[] sticker = db.call_API_print_sticker(Id);
I have to call this method a number of times and then convert to pdf. I want to store it in an array of arrays and then convert them once I have all them
How do I store it and then combine the byte array pdfs to one.

Using PDFSharp as a Nuget, I wrote the following C# method that purely works with byte arrays:
public byte[] CombinePDFs(List<byte[]> srcPDFs)
{
using (var ms = new MemoryStream())
{
using (var resultPDF = new PdfDocument(ms))
{
foreach (var pdf in srcPDFs)
{
using (var src = new MemoryStream(pdf))
{
using (var srcPDF = PdfReader.Open(src, PdfDocumentOpenMode.Import))
{
for (var i = 0; i < srcPDF.PageCount; i++)
{
resultPDF.AddPage(srcPDF.Pages[i]);
}
}
}
}
resultPDF.Save(ms);
return ms.ToArray();
}
}
}
So the above method takes an array list of source PDFs and combine them and returns a single byte array for the result PDF.

The byte[] is just one pdf probably. I would think that you could just do
System.IO.File.WriteAllBytes(#"sticker.pdf", sticker);
If that is not the case, the easiest way would be to use a nuget package ex: PdfSharp to combine multiple pdfs into one.
An example of combining pdfs
The gist (which assumes each sticker contains 1 page):
IEnumerable<byte[]> stickers;
using (var combinedPdf = new PdfDocument(#"stickers.pdf"))
foreach (var pdf in stickers)
using (MemoryStream ms = new MemoryStream(pdf))
{
var someSticker = PdfReader.Open(ms);
combinedPdf.AddPage(someSticker.Pages[0]);
}

Related

Converting html to pdf using iText.Html2pdf is taking too long

Hi I'm trying to convert a HTML String to pdf using iText.Html2pdf.
It's taking almost 3 minutes to do it.
The code is the following (pretty much a basic example):
public byte[] ConvertToPdf(string email)
{
using (var memoryStream = new MemoryStream())
{
var properties = new ConverterProperties()
.SetBaseUri(".")
.SetCreateAcroForm(false)
.SetCssApplierFactory(new DefaultCssApplierFactory())
.SetFontProvider(new DefaultFontProvider())
.SetMediaDeviceDescription(MediaDeviceDescription.CreateDefault())
.SetOutlineHandler(new OutlineHandler())
.SetTagWorkerFactory(new DefaultTagWorkerFactory());
memoryStream.Position = 0;
HtmlConverter.ConvertToPdf(email, memoryStream, properties);
return memoryStream.ToArray();
}
}
The file in question generates 6 pages.
Any suggestion?

How do I copy multiple streams into 1 for the client to download?

I'm using c# and asp core 3 and have this right now.
string templatePath = Path.Combine(_webHostEnvironment.WebRootPath, #"templates\pdf\test.pdf");
Stream finalStream = new MemoryStream();
foreach (Info p in list)
{
Stream pdfInputStream = new FileStream(path: templatePath, mode: FileMode.Open);
Stream outStream = PdfService.FillForm(pdfInputStream, p);
outStream.Position = 0;
outStream.CopyTo(finalStream);
outStream.Dispose();
pdfInputStream.Dispose();
}
finalStream.Position = 0;
return File(finalStream, "application/pdf", "test.pdf"));
Right now I just get the first PDF when there should be 3. How to combine all the streams (PDF) created in the loop into 1 PDF? I'm using iTextSharp and using this as a guide to produce the FillForm code.
https://medium.com/#taithienbo/fill-out-a-pdf-form-using-itextsharp-for-net-core-4b323cb58459
You can't just combine PDF by adding them into a single stream :-)
You can add each PDF stream to an array and request ITextSharp to combine them and after that returning the newly created stream.
List<Stream> pdfStreams = new List<Stream>();
foreach(var item in list)
{
// Open PDF + fill form
pdfStreams.Add(outstream);
}
var newStream = Merge(pdfStreams);
return File(newStream)
I don't know ITextSharp but it seems you can merge PDFs : https://weblogs.sqlteam.com/mladenp/2014/01/10/simple-merging-of-pdf-documents-with-itextsharp-5-4-5/
Edit
By the way, you could use "using" statement for stream (you wouldn't have to call dispose yourself) and I don't know how heavy are your PDFs but you should maybe consider to use the ".CopyToAsync".

convert compress xmldocument as zip and get as byte array

I am building a XmlDocument in memory (I am not writing it to disk). I need to be able to create a zip archive that would contain the Xml file and then get the zip archive as byte array (all of this without actually writing/creating anything on the hard-disk). Is this possible?
I should mention that I am trying to do this in C#.
var buffer = new MemoryStream();
using (buffer)
using (var zip = new ZipArchive(buffer, ZipArchiveMode.Create) )
{
var entry = zip.CreateEntry("content.xml", CompressionLevel.Optimal);
using (var stream = entry.Open())
{
xmlDoc.Save(stream);
}
}
var bytes = buffer.ToArray();

How to decompress .zip files in c# without extracting to new location

How can I decompress (.zip) files without extracting to a new location in the .net framework? Specifically, I'm trying to read a filename.csv.zip into a DataTable.
I'm aware of .extractToDirectory (which is within ZipArchive) but I just want to extract it into an object in c# and I would like to not create a new file.
Hoping to be able to do this w/o third party libraries, but I'll take what I can get.
May be some bugs because I never tested this, but here you go:
List<byte[]> urmom = new List<byte[]>();
using (ZipArchive archive = ZipFile.OpenRead(zipPath))
foreach (ZipArchiveEntry entry in archive.Entries)
using (StreamReader r = new StreamReader(entry.Open()))
urmom.Add(r.ReadToEnd(entry));
Basically you use the ZipArchive's openread class to iterate through each entry. At this point, you can use the streamreader to read each entry. From there you can create a file from the stream and even read the filename if you want to. My code doesn't do this, a bit of laziness on my part.
Keep in mind that a compressed stream might contain multiple files. To resolve this is required to iterate through all entries of zip file in order to retrieve them and treat separately.
The sample bellow converts a sequence of bytes in a list of string where each one is the context of the files included in zipped folder:
public static IEnumerable<string> DecompressToEntriesTextContext(byte[] input)
{
var zipEntriesContext = new List<string>();
using (var compressedStream = new MemoryStream(input))
using (var zip = new ZipArchive(compressedStream, ZipArchiveMode.Read))
{
foreach(var entry in zip.Entries)
{
using (var entryStream = entry.Open())
using (var memoryEntryStream = new MemoryStream())
using (var reader = new StreamReader(memoryEntryStream))
{
entryStream.CopyTo(memoryEntryStream);
memoryEntryStream.Position = 0;
zipEntriesContext.Add(reader.ReadToEnd());
}
}
}
return zipEntriesContext;
}

Calculate MD5 checksum for a file

I'm using iTextSharp to read the text from a PDF file. However, there are times I cannot extract text, because the PDF file is only containing images. I download the same PDF files everyday, and I want to see if the PDF has been modified. If the text and modification date cannot be obtained, is a MD5 checksum the most reliable way to tell if the file has changed?
If it is, some code samples would be appreciated, because I don't have much experience with cryptography.
It's very simple using System.Security.Cryptography.MD5:
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(filename))
{
return md5.ComputeHash(stream);
}
}
(I believe that actually the MD5 implementation used doesn't need to be disposed, but I'd probably still do so anyway.)
How you compare the results afterwards is up to you; you can convert the byte array to base64 for example, or compare the bytes directly. (Just be aware that arrays don't override Equals. Using base64 is simpler to get right, but slightly less efficient if you're really only interested in comparing the hashes.)
If you need to represent the hash as a string, you could convert it to hex using BitConverter:
static string CalculateMD5(string filename)
{
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(filename))
{
var hash = md5.ComputeHash(stream);
return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
}
}
}
This is how I do it:
using System.IO;
using System.Security.Cryptography;
public string checkMD5(string filename)
{
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(filename))
{
return Encoding.Default.GetString(md5.ComputeHash(stream));
}
}
}
I know this question was already answered, but this is what I use:
using (FileStream fStream = File.OpenRead(filename)) {
return GetHash<MD5>(fStream)
}
Where GetHash:
public static String GetHash<T>(Stream stream) where T : HashAlgorithm {
StringBuilder sb = new StringBuilder();
MethodInfo create = typeof(T).GetMethod("Create", new Type[] {});
using (T crypt = (T) create.Invoke(null, null)) {
byte[] hashBytes = crypt.ComputeHash(stream);
foreach (byte bt in hashBytes) {
sb.Append(bt.ToString("x2"));
}
}
return sb.ToString();
}
Probably not the best way, but it can be handy.
Here is a slightly simpler version that I found. It reads the entire file in one go and only requires a single using directive.
byte[] ComputeHash(string filePath)
{
using (var md5 = MD5.Create())
{
return md5.ComputeHash(File.ReadAllBytes(filePath));
}
}
I know that I am late to party but performed test before actually implement the solution.
I did perform test against inbuilt MD5 class and also md5sum.exe. In my case inbuilt class took 13 second where md5sum.exe too around 16-18 seconds in every run.
DateTime current = DateTime.Now;
string file = #"C:\text.iso";//It's 2.5 Gb file
string output;
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(file))
{
byte[] checksum = md5.ComputeHash(stream);
output = BitConverter.ToString(checksum).Replace("-", String.Empty).ToLower();
Console.WriteLine("Total seconds : " + (DateTime.Now - current).TotalSeconds.ToString() + " " + output);
}
}
For dynamically-generated PDFs.
The creation date and modified dates will always be different.
You have to remove them or set them to a constant value.
Then generate md5 hash to compare hashes.
You can use PDFStamper to remove or update dates.
In addition to the methods answered above if you're comparing PDFs you need to amend the creation and modified dates or the hashes won't match.
For PDFs generated with QuestPdf youll need to override the CreationDate and ModifiedDate in the Document Metadata.
public class PdfDocument : IDocument
{
...
DocumentMetadata GetMetadata()
{
return new()
{
CreationDate = DateTime.MinValue,
ModifiedDate = DateTime.MinValue,
};
}
...
}
https://www.questpdf.com/concepts/document-metadata.html

Categories