Merge multiple word documents into one using OpenXML and XElement

Merge multiple word documents into one using OpenXML and XElement - c#

As the title states I am trying to merge multiple word(.docx) files into one word doc. Each of these documents is one page long. I am using some of the code from this post in this implementation. The issue I am running into is that only the first document gets written properly, every other iteration appends a new document but the document contents is the same as the first.
Here is the code I am using:
//list that holds the file paths
List<String> fileNames = new List<string>();
fileNames.Add("filePath");
fileNames.Add("filePath");
fileNames.Add("filePath");
fileNames.Add("filePath");
fileNames.Add("filePath");
//get the first document
MemoryStream mainStream = new MemoryStream();
byte[] buffer = File.ReadAllBytes(fileNames[0]);
mainStream.Write(buffer, 0, buffer.Length);
using (WordprocessingDocument mainDocument = WordprocessingDocument.Open(mainStream, true))
{
//xml for the new document
XElement newBody = XElement.Parse(mainDocument.MainDocumentPart.Document.Body.OuterXml);
//iterate through eacah file
for (int i = 1; i < fileNames.Count; i++)
{
//read in the document
byte[] tempBuffer = File.ReadAllBytes(fileNames[i]);
WordprocessingDocument tempDocument = WordprocessingDocument.Open(new MemoryStream(tempBuffer), true);
//new documents XML
XElement tempBody = XElement.Parse(tempDocument.MainDocumentPart.Document.Body.OuterXml);
//add the new xml
newBody.Add(tempBody);
string str = newBody.ToString();
//write to the main document and save
mainDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString());
mainDocument.MainDocumentPart.Document.Save();
mainDocument.Package.Flush();
tempBuffer = null;
}
//write entire stream to new file
FileStream fileStream = new FileStream("xmltest.docx", FileMode.Create);
mainStream.WriteTo(fileStream);
//ret = mainStream.ToArray();
mainStream.Close();
mainStream.Dispose();
}
Again the problem is that each new document being created has the same content as the first document. So when I run this the output will be a document with five identical pages. I've tried switching the documents order around in the list and get the same result so it is nothing specific to one document.
Could anyone suggest what I am doing wrong here? I'm looking through it and I can't explain the behavior I am seeing. Any suggestions would be appreciated. Thanks much!
Edit: I'm thinking this may have something to do with that fact that the documents I am trying to merge have been generated with custom XML parts. I'm thinking that the Xpath in the documents are somehow pointing to the same content. The thing is I can open each of these documents and see the proper content, it's just when I merge them that I see the issue.

This solution uses DocumentFormat.OpenXml
public static void Join(params string[] filepaths)
{
//filepaths = new[] { "D:\\one.docx", "D:\\two.docx", "D:\\three.docx", "D:\\four.docx", "D:\\five.docx" };
if (filepaths != null && filepaths.Length > 1)
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(#filepaths[0], true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
for (int i = 1; i < filepaths.Length; i++)
{
string altChunkId = "AltChunkId" + i;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(#filepaths[i], FileMode.Open))
{
chunk.FeedData(fileStream);
}
DocumentFormat.OpenXml.Wordprocessing.AltChunk altChunk = new DocumentFormat.OpenXml.Wordprocessing.AltChunk();
altChunk.Id = altChunkId;
//new page, if you like it...
mainPart.Document.Body.AppendChild(new Paragraph(new Run(new Break() { Type = BreakValues.Page })));
//next document
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
}
mainPart.Document.Save();
myDoc.Close();
}
}

The way you seem to merge may not work properly at times. You can try one of the approaches
Using AltChunk as in http://blogs.msdn.com/b/ericwhite/archive/2008/10/27/how-to-use-altchunk-for-document-assembly.aspx
Using http://powertools.codeplex.com/ DocumentBuilder.BuildDocument method
If still you face the similar issue you can find the databound controls prior to Merge and
assign data to these controls from the CustomXml part. You can find this approach in method AssignContentFromCustomXmlPartForDataboundControl of OpenXmlHelper class. The code can be downloaded from http://worddocgenerator.codeplex.com/

Related

Adding a HTML page as a last page to PDF document

I am creating a PDF Document consisting 6 images (1 Image on 1 Page) using iTextSharp.
I need to add a HTML Page as a last page after the 6th Image.
I have tried the below, but the HTML does not get added on a new page, instead gets attached immediately below the 5th Image.
Please advice how to make the html add to the last page.
Code for reference:
string ImagePath = HttpContext.Current.Server.MapPath("~/Images/");
string[] fileNames = System.IO.Directory.GetFiles(ImagePath);
string outputFileNames = "Test.pdf";
string outputFilePath = System.Web.Hosting.HostingEnvironment.MapPath("~/Pdf/" + outputFileNames);
Document doc = new Document(PageSize.A4, 20, 20, 20, 20);
System.IO.Stream st = new FileStream(outputFilePath, FileMode.Create, FileAccess.Write);
PdfWriter writer = PdfWriter.GetInstance(doc, st);
doc.Open();
writer.PageEvent = new Footer();
for (int i = 0; i < fileNames.Length; i++)
{
string fname = fileNames[i];
if (System.IO.File.Exists(fname) && Path.GetExtension(fname) == ".png")
{
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance(fname);
img.Border = iTextSharp.text.Rectangle.BOX;
img.BorderColor = iTextSharp.text.BaseColor.BLACK;
doc.Add(img);
}
}
byte[] pdf; // result will be here
var cssText = File.ReadAllText(MapPath("~/Style1.css"));
var html = File.ReadAllText(MapPath("~/HtmlPage1.html"));
using ( var memoryStream = new MemoryStream())
{
using (var cssMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssText)))
{
using (var htmlMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, htmlMemoryStream, cssMemoryStream);
}
}
pdf = memoryStream.ToArray();
//document.Add(new Paragraph(Encoding.UTF8.GetString(pdf)));
}
doc.NewPage();
doc.Add(new Paragraph(Encoding.UTF8.GetString(pdf)));
doc.Close();
writer.Close();
I need to add a HTML Page as a last page after the 6th Image.
Any help is appreciated

In contrast to what you assume according to your code comments, pdf is not where the result will be. It remains empty:
byte[] pdf; // result will be here
...
using ( var memoryStream = new MemoryStream())
{
... code not accessing memoryStream ...
pdf = memoryStream.ToArray();
//document.Add(new Paragraph(Encoding.UTF8.GetString(pdf)));
}
doc.NewPage();
doc.Add(new Paragraph(Encoding.UTF8.GetString(pdf)));
Thus, you add the new page before adding an empty paragraph, after the converted html already has been added to the document.
Actually it is added during
XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, htmlMemoryStream, cssMemoryStream);
So you have to add the new page before that. Thus, the following replacing everything from your byte[] pdf; on should do the job:
var cssText = File.ReadAllText(MapPath("~/Style1.css"));
var html = File.ReadAllText(MapPath("~/HtmlPage1.html"));
using (var cssMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssText)))
{
using (var htmlMemoryStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
{
doc.NewPage();
XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, htmlMemoryStream, cssMemoryStream);
}
}
doc.Close();
As an aside, don't close the writer! It implicitly is closed when the doc is closed. Closing it again does nothing at best or damage otherwise.
In a comment you claimed
but this also does not resolve the issue... the pdf content still get added after the image and then continued on new page.
So I tested the proposed change. Obviously I don't have your environment and also not your image, html, and css files. Thus, I used own ones, a small screen shot and "<html><body><h1>Test</h1><p>This is a test piece of html</p></body></html>".
With your code I get:
With the code changed as described above I get
My impression here is that the proposed code change does resolve the issue. The html content is added on a new page.
Thus apparently your either incorrectly applied the proposed change, or you executed old code, or you inspected some old result.

ITextSharp Parsing HTML with Images in it: It parses correctly but wont show images

I am trying to generate a .pdf from html using the library ITextSharp. I am able to create the pdf with the html text converted to pdf text/paragraphs
My Problem: The pdf does not show my images(my img elements from the html). All my img html elements in my html dont get displayed in the pdf? Is it possible for ITextSharp to parse HTML & display images. I really hope so otherwise I am stuffed :(
I am linking to the correct directory where the images are(using IMG_BASURL) but they are just not showing
My code:
// mainContents variable is a string containing my HTML
var document = new Document(PageSize.A4, 50, 50, 80, 100);
var output = new MemoryStream();
var writer = PdfWriter.GetInstance(document, output);
document.open();
Hashtable providers = new Hashtable();
providers.Add("img_baseurl","C:/users/xx/VisualStudio/Projects/myproject/");
var parsedHtmlElements = HTMLWorker.ParseToList(new StringReader(mainContents), null, providers);
foreach (var htmlElement in parsedHtmlElements)
document.Add(htmlElement as IElement);
document.Close();

Every time that I've encountered this the problem was that the image was too large for the canvas. More specifically, even a naked IMG tag internally will get wrapped in a Chunk that will get wrapped in a Paragraph, and I think that the image is overflowing the Paragraph but I'm not 100% sure.
The two easy fixes are to either enlarge the canvas or to specify image dimensions on the HTML IMG tag. The third more complex route would be to use an additional provider IMG_PROVIDER. To do this you need to implement the IImageProvider interface. Below is a very simple version of one
public class ImageThing : IImageProvider {
//Store a reference to the main document so that we can access the page size and margins
private Document MainDoc;
//Constructor
public ImageThing(Document doc) {
this.MainDoc = doc;
}
Image IImageProvider.GetImage(string src, IDictionary<string, string> attrs, ChainedProperties chain, IDocListener doc) {
//Prepend the src tag with our path. NOTE, when using HTMLWorker.IMG_PROVIDER, HTMLWorker.IMG_BASEURL gets ignored unless you choose to implement it on your own
src = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + #"\" + src;
//Get the image. NOTE, this will attempt to download/copy the image, you'd really want to sanity check here
Image img = Image.GetInstance(src);
//Make sure we got something
if (img == null) return null;
//Determine the usable area of the canvas. NOTE, this doesn't take into account the current "cursor" position so this might create a new blank page just for the image
float usableW = this.MainDoc.PageSize.Width - (this.MainDoc.LeftMargin + this.MainDoc.RightMargin);
float usableH = this.MainDoc.PageSize.Height - (this.MainDoc.TopMargin + this.MainDoc.BottomMargin);
//If the downloaded image is bigger than either width and/or height then shrink it
if (img.Width > usableW || img.Height > usableH) {
img.ScaleToFit(usableW, usableH);
}
//return our image
return img;
}
}
To use this provider just add it to the provider collection like you did with HTMLWorker.IMG_BASEURL:
providers.Add(HTMLWorker.IMG_PROVIDER, new ImageThing(doc));
It should be noted that if you use HTMLWorker.IMG_PROVIDER that you are responsible for figuring out everything about the image. The code above assumes that all image paths need to be prepended with a constant string, you'll probably want to update this and check for HTTP at the start. Also, because we're saying that we want to completely handle the image processing pipeline the provider HTMLWorker.IMG_BASEURL is no longer needed.
The main code loop would now look something like this:
string html = #"<img src=""Untitled-1.png"" />";
string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "HtmlTest.pdf");
using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.A4, 50, 50, 80, 100)) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
using (StringReader sr = new StringReader(html)) {
System.Collections.Generic.Dictionary<string, object> providers = new System.Collections.Generic.Dictionary<string, object>();
providers.Add(HTMLWorker.IMG_PROVIDER, new ImageThing(doc));
var parsedHtmlElements = HTMLWorker.ParseToList(sr, null, providers);
foreach (var htmlElement in parsedHtmlElements) {
doc.Add(htmlElement as IElement);
}
}
doc.Close();
}
}
}
One last thing, make sure to specify which version of iTextSharp you are targetting when posting here. The code above targets iTextSharp 5.1.2.0 but I think you might be using the 4.X series.

I faced the same problem and tried the following proposed solutions:
string replaced a tag, encode in base64 and embed the image to a .NET class library but none worked !
So I've come to the old fashioned solution: adding the logo manually with doc.Add()
Here's your code updated:
string html = #"<img src=""Untitled-1.png"" />";
string outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "HtmlTest.pdf");
using (FileStream fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.A4, 50, 50, 80, 100)) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
using (StringReader sr = new StringReader(html)) {
System.Collections.Generic.Dictionary<string, object> providers = new System.Collections.Generic.Dictionary<string, object>();
providers.Add(HTMLWorker.IMG_PROVIDER, new ImageThing(doc));
var parsedHtmlElements = HTMLWorker.ParseToList(sr, null, providers);
foreach (var htmlElement in parsedHtmlElements) {
doc.Add(htmlElement as IElement);
}
// here's the magic
var logo = iTextSharp.text.Image.GetInstance(Server.MapPath("~/HTMLTemplate/logo.png"));
logo.SetAbsolutePosition(440, 800);
document.Add(logo);
// end
}
doc.Close();
}
}
}

string siteUrl = HttpContext.Current.Server.MapPath("/images/image/ticket/Ticket.jpg");
string HTML = "<table><tr><td><u>asdasdsadasdsa <img src='" + siteUrl + "' al='tt' /> </u></td></tr></table>";

DocumentFormat.OpenXml Adding an Image to a word doc

I am creating a simple word doc, using the openXml SDK.
It is working so far.
Now how can I add an image from my file system to this doc? I don't care where it is in the doc just so it is there.
Thanks!
Here is what I have so far.
string fileName = "proposal"+dealerId +Guid.NewGuid().ToString()+".doc";
string filePath = #"C:\DWSApplicationFiles\Word\" + fileName;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Create(filePath, WordprocessingDocumentType.Document, true))
{
MainDocumentPart mainPart = wordDoc.AddMainDocumentPart();
mainPart.Document = new Document();
//create the body
Body body = new Body();
DocumentFormat.OpenXml.Wordprocessing.Paragraph p = new DocumentFormat.OpenXml.Wordprocessing.Paragraph();
DocumentFormat.OpenXml.Wordprocessing.Run runParagraph = new DocumentFormat.OpenXml.Wordprocessing.Run();
DocumentFormat.OpenXml.Wordprocessing.Text text_paragraph = new DocumentFormat.OpenXml.Wordprocessing.Text("This is a test");
runParagraph.Append(text_paragraph);
p.Append(runParagraph);
body.Append(p);
mainPart.Document.Append(body);
mainPart.Document.Save();
}

Here is a method that can be simpler than the one described in the msdn page posted above, this code is in C++/CLI but of course you can write the equivalent in C#
WordprocessingDocument^ doc = WordprocessingDocument::Open(doc_name, true);
FileStream^ img_fs = gcnew FileStream(image_path, FileMode::Open);
ImagePart^ image_part = doc->MainDocumentPart->AddImagePart(ImagePartType::Jpeg);
image_part->FeedData(img_fs);
Run^ img_run = doc->MainDocumentPart->Document->Body->AppendChild(gcnew Paragraph())->AppendChild(gcnew Run());
Vml::ImageData^ img_data = img_run->AppendChild(gcnew Picture())->AppendChild(gcnew Vml::Shape())->AppendChild(gcnew Vml::ImageData());
img_data->RelationshipId = doc->MainDocumentPart->GetIdOfPart(image_part);
doc->Close();

This code worked for me: http://msdn.microsoft.com/en-us/library/bb497430.aspx
Your code adds image to your docx package, but in order to see it in the document you have to declare it in your document.xml i.e. link it to your physical image. That's why you have to write that long function listed in the msdn link.
My problem is how to add effects to pictures (editing, croping, background removal).
If you know how to do this I'd appreciate your help :)

How to: Add an Image Part to an Office Open XML Package by Using the Open XML API
http://msdn.microsoft.com/en-us/library/bb497430(v=office.12).aspx
public static void AddImagePart(string document, string fileName)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Jpeg);
using (FileStream stream = new FileStream(fileName, FileMode.Open))
{
imagePart.FeedData(stream);
}
}
}

How to flatten already filled out PDF form using iTextSharp

I'm using iTextSharp to merge a number of pdf files together into a single file.
I'm using method described in iTextSharp official tutorials, specifically here, which merges files page by page via PdfWriter and PdfImportedPage.
Turns out some of the files I need to merge are filled out PDF Forms and using this method of merging form data is lost.
I've see several examples of using PdfStamper to fill out forms and flatten them.
What I can't find, is a way to flatten already filled out PDF Form and hopefully merge it with the other files without saving it flattened out version first.
Thanks

Just setting .FormFlattening on PdfStamper wasn't quite enough...I ended up using a PdfReader with byte array of file contents that i used to stamp/flatten the data to get the byte array of that to put in a new PdfReader. Below is how i did it. works great now.
private void AppendPdfFile(FileDTO file, PdfContentByte cb, iTextSharp.text.Document printDocument, PdfWriter iwriter)
{
var reader = new PdfReader(file.FileContents);
if (reader.AcroForm != null)
reader = new PdfReader(FlattenPdfFormToBytes(reader,file.FileID));
AppendFilePages(reader, printDocument, iwriter, cb);
}
private byte[] FlattenPdfFormToBytes(PdfReader reader, Guid fileID)
{
var memStream = new MemoryStream();
var stamper = new PdfStamper(reader, memStream) {FormFlattening = true};
stamper.Close();
return memStream.ToArray();
}

When creating the files to be merged, I changed this setting:
pdfStamper.FormFlattening = true;
Works Great.

I think this problem is same with this one: AcroForm values missing after flattening
Based on the answer, this should do the trick:
pdfStamper.FormFlattening = true;
pdfStamper.AcroFields.GenerateAppearances = true;

This is the same answer as the accepted one but without any unused variables:
private byte[] GetUnEditablePdf(byte[] fileContents)
{
byte[] newFileContents = null;
var reader = new PdfReader(fileContents);
if (reader.AcroForm != null)
newFileContents = FlattenPdfFormToBytes(reader);
else newFileContents = fileContents;
return newFileContents;
}
private byte[] FlattenPdfFormToBytes(PdfReader reader)
{
var memStream = new MemoryStream();
var stamper = new PdfStamper(reader, memStream) { FormFlattening = true };
stamper.Close();
return memStream.ToArray();
}

Combine two (or more) PDF's

Background: I need to provide a weekly report package for my sales staff. This package contains several (5-10) crystal reports.
Problem:
I would like to allow a user to run all reports and also just run a single report. I was thinking I could do this by creating the reports and then doing:
List<ReportClass> reports = new List<ReportClass>();
reports.Add(new WeeklyReport1());
reports.Add(new WeeklyReport2());
reports.Add(new WeeklyReport3());
<snip>
foreach (ReportClass report in reports)
{
report.ExportToDisk(ExportFormatType.PortableDocFormat, #"c:\reports\" + report.ResourceName + ".pdf");
}
This would provide me a folder full of the reports, but I would like to email everyone a single PDF with all the weekly reports. So I need to combine them.
Is there an easy way to do this without install any more third party controls? I already have DevExpress & CrystalReports and I'd prefer not to add too many more.
Would it be best to combine them in the foreach loop or in a seperate loop? (or an alternate way)

I had to solve a similar problem and what I ended up doing was creating a small pdfmerge utility that uses the PDFSharp project which is essentially MIT licensed.
The code is dead simple, I needed a cmdline utility so I have more code dedicated to parsing the arguments than I do for the PDF merging:
using (PdfDocument one = PdfReader.Open("file1.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument two = PdfReader.Open("file2.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument outPdf = new PdfDocument())
{
CopyPages(one, outPdf);
CopyPages(two, outPdf);
outPdf.Save("file1and2.pdf");
}
void CopyPages(PdfDocument from, PdfDocument to)
{
for (int i = 0; i < from.PageCount; i++)
{
to.AddPage(from.Pages[i]);
}
}

Here is a single function that will merge X amount of PDFs using PDFSharp
using PdfSharp;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
public static void MergePDFs(string targetPath, params string[] pdfs) {
using(var targetDoc = new PdfDocument()){
foreach (var pdf in pdfs) {
using (var pdfDoc = PdfReader.Open(pdf, PdfDocumentOpenMode.Import)) {
for (var i = 0; i < pdfDoc.PageCount; i++)
targetDoc.AddPage(pdfDoc.Pages[i]);
}
}
targetDoc.Save(targetPath);
}
}

This is something that I figured out, and wanted to share with you, using PdfSharp.
Here you can join multiple Pdfs in one, without the need of an output directory (following the input list order)
public static byte[] MergePdf(List<byte[]> pdfs)
{
List<PdfSharp.Pdf.PdfDocument> lstDocuments = new List<PdfSharp.Pdf.PdfDocument>();
foreach (var pdf in pdfs)
{
lstDocuments.Add(PdfReader.Open(new MemoryStream(pdf), PdfDocumentOpenMode.Import));
}
using (PdfSharp.Pdf.PdfDocument outPdf = new PdfSharp.Pdf.PdfDocument())
{
for(int i = 1; i<= lstDocuments.Count; i++)
{
foreach(PdfSharp.Pdf.PdfPage page in lstDocuments[i-1].Pages)
{
outPdf.AddPage(page);
}
}
MemoryStream stream = new MemoryStream();
outPdf.Save(stream, false);
byte[] bytes = stream.ToArray();
return bytes;
}
}

I used iTextsharp with c# to combine pdf files. This is the code I used.
string[] lstFiles=new string[3];
lstFiles[0]=#"C:/pdf/1.pdf";
lstFiles[1]=#"C:/pdf/2.pdf";
lstFiles[2]=#"C:/pdf/3.pdf";
PdfReader reader = null;
Document sourceDocument = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage;
string outputPdfPath=#"C:/pdf/new.pdf";
sourceDocument = new Document();
pdfCopyProvider = new PdfCopy(sourceDocument, new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
//Open the output file
sourceDocument.Open();
try
{
//Loop through the files list
for (int f = 0; f < lstFiles.Length-1; f++)
{
int pages =get_pageCcount(lstFiles[f]);
reader = new PdfReader(lstFiles[f]);
//Add pages of current file
for (int i = 1; i <= pages; i++)
{
importedPage = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(importedPage);
}
reader.Close();
}
//At the end save the output file
sourceDocument.Close();
}
catch (Exception ex)
{
throw ex;
}
private int get_pageCcount(string file)
{
using (StreamReader sr = new StreamReader(File.OpenRead(file)))
{
Regex regex = new Regex(#"/Type\s*/Page[^s]");
MatchCollection matches = regex.Matches(sr.ReadToEnd());
return matches.Count;
}
}

Here is a example using iTextSharp
public static void MergePdf(Stream outputPdfStream, IEnumerable<string> pdfFilePaths)
{
using (var document = new Document())
using (var pdfCopy = new PdfCopy(document, outputPdfStream))
{
pdfCopy.CloseStream = false;
try
{
document.Open();
foreach (var pdfFilePath in pdfFilePaths)
{
using (var pdfReader = new PdfReader(pdfFilePath))
{
pdfCopy.AddDocument(pdfReader);
pdfReader.Close();
}
}
}
finally
{
document?.Close();
}
}
}
The PdfReader constructor has many overloads. It's possible to replace the parameter type IEnumerable<string> with IEnumerable<Stream> and it should work as well. Please notice that the method does not close the OutputStream, it delegates that task to the Stream creator.

PDFsharp seems to allow merging multiple PDF documents into one.
And the same is also possible with ITextSharp.

Combining two byte[] using iTextSharp up to version 5.x:
internal static MemoryStream mergePdfs(byte[] pdf1, byte[] pdf2)
{
MemoryStream outStream = new MemoryStream();
using (Document document = new Document())
using (PdfCopy copy = new PdfCopy(document, outStream))
{
document.Open();
copy.AddDocument(new PdfReader(pdf1));
copy.AddDocument(new PdfReader(pdf2));
}
return outStream;
}
Instead of the byte[]'s it's possible to pass also Stream's

There's some good answers here already, but I thought I might mention that pdftk might be useful for this task. Instead of producing one PDF directly, you could produce each PDF you need and then combine them together as a post-process with pdftk. This could even be done from within your program using a system() or ShellExecute() call.

You could try pdf-shuffler gtk-apps.org

I know a lot of people have recommended PDF Sharp, however it doesn't look like that project has been updated since june of 2008. Further, source isn't available.
Personally, I've been playing with iTextSharp which has been pretty easy to work with.

I combined the two above, because I needed to merge 3 pdfbytes and return a byte
internal static byte[] mergePdfs(byte[] pdf1, byte[] pdf2,byte[] pdf3)
{
MemoryStream outStream = new MemoryStream();
using (Document document = new Document())
using (PdfCopy copy = new PdfCopy(document, outStream))
{
document.Open();
copy.AddDocument(new PdfReader(pdf1));
copy.AddDocument(new PdfReader(pdf2));
copy.AddDocument(new PdfReader(pdf3));
}
return outStream.ToArray();
}

Following method gets a List of byte array which is PDF byte array and then returns a byte array.
using ...;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
public static class PdfHelper
{
public static byte[] PdfConcat(List<byte[]> lstPdfBytes)
{
byte[] res;
using (var outPdf = new PdfDocument())
{
foreach (var pdf in lstPdfBytes)
{
using (var pdfStream = new MemoryStream(pdf))
using (var pdfDoc = PdfReader.Open(pdfStream, PdfDocumentOpenMode.Import))
for (var i = 0; i < pdfDoc.PageCount; i++)
outPdf.AddPage(pdfDoc.Pages[i]);
}
using (var memoryStreamOut = new MemoryStream())
{
outPdf.Save(memoryStreamOut, false);
res = Stream2Bytes(memoryStreamOut);
}
}
return res;
}
public static void DownloadAsPdfFile(string fileName, byte[] content)
{
var ms = new MemoryStream(content);
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.ContentType = "application/pdf";
HttpContext.Current.Response.AddHeader("content-disposition", $"attachment;filename={fileName}.pdf");
HttpContext.Current.Response.Buffer = true;
ms.WriteTo(HttpContext.Current.Response.OutputStream);
HttpContext.Current.Response.End();
}
private static byte[] Stream2Bytes(Stream input)
{
var buffer = new byte[input.Length];
using (var ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
ms.Write(buffer, 0, read);
return ms.ToArray();
}
}
}
So, the result of PdfHelper.PdfConcat method is passed to PdfHelper.DownloadAsPdfFile method.
PS: A NuGet package named [PdfSharp][1] need to be installed. So in the Package Manage Console window type:
Install-Package PdfSharp

Following method merges two pdfs( f1 and f2) using iTextSharp. The second pdf is appended after a specific index of f1.
string f1 = "D:\\a.pdf";
string f2 = "D:\\Iso.pdf";
string outfile = "D:\\c.pdf";
appendPagesFromPdf(f1, f2, outfile, 3);
public static void appendPagesFromPdf(String f1,string f2, String destinationFile, int startingindex)
{
PdfReader p1 = new PdfReader(f1);
PdfReader p2 = new PdfReader(f2);
int l1 = p1.NumberOfPages, l2 = p2.NumberOfPages;
//Create our destination file
using (FileStream fs = new FileStream(destinationFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
Document doc = new Document();
PdfWriter w = PdfWriter.GetInstance(doc, fs);
doc.Open();
for (int page = 1; page <= startingindex; page++)
{
doc.NewPage();
w.DirectContent.AddTemplate(w.GetImportedPage(p1, page), 0, 0);
//Used to pull individual pages from our source
}// copied pages from first pdf till startingIndex
for (int i = 1; i <= l2;i++)
{
doc.NewPage();
w.DirectContent.AddTemplate(w.GetImportedPage(p2, i), 0, 0);
}// merges second pdf after startingIndex
for (int i = startingindex+1; i <= l1;i++)
{
doc.NewPage();
w.DirectContent.AddTemplate(w.GetImportedPage(p1, i), 0, 0);
}// continuing from where we left in pdf1
doc.Close();
p1.Close();
p2.Close();
}
}

To solve a similar problem i used iTextSharp like this:
//Create the document which will contain the combined PDF's
Document document = new Document();
//Create a writer for de document
PdfCopy writer = new PdfCopy(document, new FileStream(OutPutFilePath, FileMode.Create));
if (writer == null)
{
return;
}
//Open the document
document.Open();
//Get the files you want to combine
string[] filePaths = Directory.GetFiles(DirectoryPathWhereYouHaveYourFiles);
foreach (string filePath in filePaths)
{
//Read the PDF file
using (PdfReader reader = new PdfReader(vls_FilePath))
{
//Add the file to the combined one
writer.AddDocument(reader);
}
}
//Finally close the document and writer
writer.Close();
document.Close();

Here is a link to an example using PDFSharp and ConcatenateDocuments

Here the solution http://www.wacdesigns.com/2008/10/03/merge-pdf-files-using-c
It use free open source iTextSharp library http://sourceforge.net/projects/itextsharp

I've done this with PDFBox. I suppose it works similarly to iTextSharp.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Merge multiple word documents into one using OpenXML and XElement - c#

Related

Adding a HTML page as a last page to PDF document

ITextSharp Parsing HTML with Images in it: It parses correctly but wont show images

DocumentFormat.OpenXml Adding an Image to a word doc

How to flatten already filled out PDF form using iTextSharp

Combine two (or more) PDF's

Categories

Resources