Openxml merging docx with images - c#

In short: I would like to insert the content of a docx that contains images and bullets in another docx.
My problem: I used two approaches:
Manual merge
Altchunk
With both of them I got a corrupted word document as result.
If I remove the images from the docx that I would like to insert in another one, the result docx is OK.
My code:
Manual merge (thanks to https://stackoverflow.com/a/48870385/10075827):
private static void ManualMerge(string firstPath, string secondPath, string resultPath)
{
if (!System.IO.Path.GetFileName(firstPath).StartsWith("~$"))
{
File.Copy(firstPath, resultPath, true);
using (WordprocessingDocument result = WordprocessingDocument.Open(resultPath, true))
{
using (WordprocessingDocument secondDoc = WordprocessingDocument.Open(secondPath, false))
{
OpenXmlElement p = result.MainDocumentPart.Document.Body.Descendants<Paragraph>().Last();
foreach (var e in secondDoc.MainDocumentPart.Document.Body.Elements())
{
var clonedElement = e.CloneNode(true);
clonedElement.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().ToList().ForEach(blip =>
{
var newRelation = result.CopyImage(blip.Embed, secondDoc);
blip.Embed = newRelation;
});
clonedElement.Descendants<DocumentFormat.OpenXml.Vml.ImageData>().ToList().ForEach(imageData =>
{
var newRelation = result.CopyImage(imageData.RelationshipId, secondDoc);
imageData.RelationshipId = newRelation;
});
result.MainDocumentPart.Document.Body.Descendants<Paragraph>().Last();
if (clonedElement is Paragraph)
{
p.InsertAfterSelf(clonedElement);
p = clonedElement;
}
}
}
}
}
}
public static string CopyImage(this WordprocessingDocument newDoc, string relId, WordprocessingDocument org)
{
var p = org.MainDocumentPart.GetPartById(relId) as ImagePart;
var newPart = newDoc.MainDocumentPart.AddPart(p);
newPart.FeedData(p.GetStream());
return newDoc.MainDocumentPart.GetIdOfPart(newPart);
}
Altchunk merge (from http://www.karthikscorner.com/sharepoint/use-altchunk-document-assembly/):
private static void AltchunkMerge(string firstPath, string secondPath, string resultPath)
{
WordprocessingDocument mainDocument = null;
MainDocumentPart mainPart = null;
var ms = new MemoryStream();
#region Prepare - consuming application
byte[] bytes = File.ReadAllBytes(firstPath);
ms.Write(bytes, 0, bytes.Length);
mainDocument = WordprocessingDocument.Open(ms, true);
mainPart = mainDocument.MainDocumentPart;
#endregion
#region Document to be imported
FileStream fileStream = new FileStream(secondPath, FileMode.Open);
#endregion
#region Merge
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, "AltChunkId101");
chunk.FeedData(fileStream);
var altChunk = new AltChunk(new AltChunkProperties() { MatchSource = new MatchSource() { Val = new OnOffValue(true) } });
altChunk.Id = "AltChunkId101";
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
#endregion
#region Mark dirty
var listOfFieldChar = mainPart.Document.Body.Descendants<FieldChar>();
foreach (FieldChar current in listOfFieldChar)
{
if (string.Compare(current.FieldCharType, "begin", true) == 0)
{
current.Dirty = new OnOffValue(true);
}
}
#endregion
#region Save Merged Document
mainPart.DocumentSettingsPart.Settings.PrependChild(new UpdateFieldsOnOpen() { Val = new OnOffValue(true) });
mainDocument.Close();
FileStream file = new FileStream(resultPath, FileMode.Create, FileAccess.Write);
ms.WriteTo(file);
file.Close();
ms.Close();
#endregion
}
I spent hours searching for a solution and the most common one I found was to use altchunk. So why is it not working in my case?

If you are able to use the Microsoft.Office.Interop.Word namespace, and able to put a bookmark in the file you want to merge into, you can take this approach:
using Microsoft.Office.Interop.Word;
...
// merge by putting second file into bookmark in first file
private static void NewMerge(string firstPath, string secondPath, string resultPath, string firstBookmark)
{
var app = new Application();
var firstDoc = app.Documents.Open(firstPath);
var bookmarkRange = firstDoc.Bookmarks[firstBookmark];
// Collapse the range to the end, as to not overwrite it. Unsure if you need this
bookmarkRange.Collapse(WdCollapseDirection.wdCollapseEnd);
// Insert into the selected range
// use if relative path
bookmarkRange.InsertFile(Environment.CurrentDirectory + secondPath);
// use if absolute path
//bookmarkRange.InsertFile(secondPath);
}
Related:
C#: Insert and indent bullet points at bookmark in word document using Office Interop libraries

Related

How to extract all pages and attachments from PDF to PNG

I am trying to create a process in .NET to convert a PDF and all it's pages + attachments to PNGs. I am evaluating libraries and came across PDFiumSharp but it is not working for me. Here is my code:
string Inputfile = "input.pdf";
string OutputFolder = "Output";
string fileName = Path.GetFileNameWithoutExtension(Inputfile);
using (PdfDocument doc = new PdfDocument(Inputfile))
{
for (int i = 0; i < doc.Pages.Count; i++)
{
var page = doc.Pages[i];
using (var bitmap = new PDFiumBitmap((int)page.Width, (int)page.Height, false))
{
page.Render(bitmap);
var targetFile = Path.Combine(OutputFolder, fileName + "_" + i + ".png");
bitmap.Save(targetFile);
}
}
}
When I run this code, I get this exception:
screenshot of exception
Does anyone know how to fix this? Also does PDFiumSharp support extracting PDF attachments? If not, does anyone have any other ideas on how to achieve my goal?
PDFium does not look like it supports extracting PDF attachments. If you want to achieve your goal, then you can take a look at another library that supports both extracting PDF attachments as well as converting PDFs to PNGs.
I am an employee of the LEADTOOLS PDF SDK which you can try out via these 2 nuget packages:
https://www.nuget.org/packages/Leadtools.Pdf/
https://www.nuget.org/packages/Leadtools.Document.Sdk/
Here is some code that will convert a PDF + all attachments in the PDF to separate PNGs in an output directory:
SetLicense();
cache = new FileCache { CacheDirectory = "cache" };
List<LEADDocument> documents = new List<LEADDocument>();
if (!Directory.Exists(OutputDir))
Directory.CreateDirectory(OutputDir);
using var document = DocumentFactory.LoadFromFile("attachments.pdf", new LoadDocumentOptions { Cache = cache, LoadAttachmentsMode = DocumentLoadAttachmentsMode.AsAttachments });
if (document.Pages.Count > 0)
documents.Add(document);
foreach (var attachment in document.Attachments)
documents.Add(document.LoadDocumentAttachment(new LoadAttachmentOptions { AttachmentNumber = attachment.AttachmentNumber }));
ConvertDocuments(documents, RasterImageFormat.Png);
And the ConvertDocuments method:
static void ConvertDocuments(IEnumerable<LEADDocument> documents, RasterImageFormat imageFormat)
{
using var converter = new DocumentConverter();
using var ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(null, null, null, null);
converter.SetOcrEngineInstance(ocrEngine, false);
converter.SetDocumentWriterInstance(new DocumentWriter());
foreach (var document in documents)
{
var name = string.IsNullOrEmpty(document.Name) ? "Attachment" : document.Name;
string outputFile = Path.Combine(OutputDir, $"{name}.{RasterCodecs.GetExtension(imageFormat)}");
int count = 1;
while (File.Exists(outputFile))
outputFile = Path.Combine(OutputDir, $"{name}({count++}).{RasterCodecs.GetExtension(imageFormat)}");
var jobData = new DocumentConverterJobData
{
Document = document,
Cache = cache,
DocumentFormat = DocumentFormat.User,
RasterImageFormat = imageFormat,
RasterImageBitsPerPixel = 0,
OutputDocumentFileName = outputFile,
};
var job = converter.Jobs.CreateJob(jobData);
converter.Jobs.RunJob(job);
}
}

Export DocX file with Footer using DocumentFormat.OpenXml

I want to generate a DocX file with footer from HTML.
Using the following lib: DocumentFormat.OpenXml
I manage to generate the DocX file, BUT without Footer.
The code that I use is the following:
class HtmlToDoc
{
public static byte[] GenerateDocX(string html)
{
MemoryStream ms;
MainDocumentPart mainPart;
Body b;
Document d;
AlternativeFormatImportPart chunk;
AltChunk altChunk;
string altChunkID = "AltChunkId1";
ms = new MemoryStream();
using(var myDoc = WordprocessingDocument.Create(ms, WordprocessingDocumentType.Document))
{
mainPart = myDoc.MainDocumentPart;
if (mainPart == null)
{
mainPart = myDoc.AddMainDocumentPart();
b = new Body();
d = new Document(b);
d.Save(mainPart);
}
chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Xhtml, altChunkID);
using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
{
using (StreamWriter stringStream = new StreamWriter(chunkStream))
{
stringStream.Write("<html><head></head><body>" + html + "</body></html>");
}
}
altChunk = new AltChunk();
altChunk.Id = altChunkID;
mainPart.Document.Body.InsertAt(altChunk, 0);
AddFooter(myDoc);
mainPart.Document.Save();
}
return ms.ToArray();
}
private static void AddFooter(WordprocessingDocument doc)
{
string newFooterText = "New footer via Open XML Format SDK 2.0 classes";
MainDocumentPart mainDocPart = doc.MainDocumentPart;
FooterPart newFooterPart = mainDocPart.AddNewPart<FooterPart>();
string rId = mainDocPart.GetIdOfPart(newFooterPart);
GeneratePageFooterPart(newFooterText).Save(newFooterPart);
foreach (SectionProperties sectProperties in
mainDocPart.Document.Descendants<SectionProperties>())
{
foreach (FooterReference footerReference in
sectProperties.Descendants<FooterReference>())
sectProperties.RemoveChild(footerReference);
FooterReference newFooterReference =
new FooterReference() { Id = rId, Type = HeaderFooterValues.Default };
sectProperties.Append(newFooterReference);
}
mainDocPart.Document.Save();
}
private static Footer GeneratePageFooterPart(string FooterText)
{
PositionalTab pTab = new PositionalTab()
{
Alignment = AbsolutePositionTabAlignmentValues.Center,
RelativeTo = AbsolutePositionTabPositioningBaseValues.Margin,
Leader = AbsolutePositionTabLeaderCharValues.None
};
var elment =
new Footer(
new Paragraph(
new ParagraphProperties(
new ParagraphStyleId() { Val = "Footer" }),
new Run(pTab,
new Text(FooterText))
)
);
return elment;
}
}
I tried some other examples too for generating the footer, but the results were the same: generated but WITHOUT footer.
What could be the problem ?
This is how you can add footer to a docx file:
static void Main(string[] args)
{
using (WordprocessingDocument document =
WordprocessingDocument.Open("Document.docx", true))
{
MainDocumentPart mainDocumentPart = document.MainDocumentPart;
// Delete the existing footer parts
mainDocumentPart.DeleteParts(mainDocumentPart.FooterParts);
// Create a new footer part
FooterPart footerPart = mainDocumentPart.AddNewPart<FooterPart>();
// Get Id of footer part
string footerPartId = mainDocumentPart.GetIdOfPart(footerPart);
GenerateFooterPartContent(footerPart);
// Get SectionProperties and Replace FooterReference with new Id
IEnumerable<SectionProperties> sections =
mainDocumentPart.Document.Body.Elements<SectionProperties>();
foreach (var section in sections)
{
// Delete existing references to headers and footers
section.RemoveAllChildren<FooterReference>();
// Create the new header and footer reference node
section.PrependChild<FooterReference>(
new FooterReference() { Id = footerPartId });
}
}
}
static void GenerateFooterPartContent(FooterPart part)
{
Footer footer1 = new Footer();
Paragraph paragraph1 = new Paragraph();
ParagraphProperties paragraphProperties1 = new ParagraphProperties();
ParagraphStyleId paragraphStyleId1 = new ParagraphStyleId() { Val = "Footer" };
paragraphProperties1.Append(paragraphStyleId1);
Run run1 = new Run();
Text text1 = new Text();
text1.Text = "Footer";
run1.Append(text1);
paragraph1.Append(paragraphProperties1);
paragraph1.Append(run1);
footer1.Append(paragraph1);
part.Footer = footer1;
}

C# : Editing/saving/Sending a docx document

Been strugling with a lot of problems. Using OpenXML on a ASP.NET Core server, I want to create a new docx document based on a template one. Once this document is fully saved, I want it to be sent to my client so he can download it directly. Here's my code :
public IActionResult Post([FromBody] Consultant consultant)
{
using (Stream templateStream = new MemoryStream(Properties.Resources.templateDossierTech))
using (WordprocessingDocument template =
WordprocessingDocument.Open(templateStream, false))
{
string fileName = environment.WebRootPath + #"\Resources\"+ consultant.FirstName + "_" + consultant.LastName + ".docx";
WordprocessingDocument dossierTechniqueDocument =
WordprocessingDocument.Create(fileName,
WordprocessingDocumentType.Document);
foreach (var part in template.Parts)
{
dossierTechniqueDocument.AddPart(part.OpenXmlPart, part.RelationshipId);
}
var body = dossierTechniqueDocument.MainDocumentPart.Document.Body;
var paras = body.Elements();
foreach (var para in paras)
{
foreach (var run in para.Elements())
{
foreach (var text in run.Elements())
{
if (text.InnerText.Contains("{{prenom}}"))
{
var t = new Text(text.InnerText.Replace("{{prenom}}", consultant.FirstName));
run.RemoveAllChildren<Text>();
run.AppendChild(t);
}
}
}
}
dossierTechniqueDocument.MainDocumentPart.Document.Save();
dossierTechniqueDocument.Close();
var cd = new System.Net.Mime.ContentDisposition
{
FileName = consultant.FirstName + "_" + consultant.LastName + ".docx",
Inline = true
};
Response.Headers.Add("Content-Disposition", cd.ToString());
Response.Headers.Add("X-Content-Type-Options", "nosniff");
return File(System.IO.File.ReadAllBytes(fileName),"application/vnd.openxmlformats-officedocument.wordprocessingml.document","Dossier Technique");
}
}
As a first look, it looks like is saving well but when I try to open it on word, it says that it is corrupted for some reason.
That's the same problem when I try to send it. Once it's sent my client doesn't download it (Ajax query).
Do anyone of you have any idea how to fix it ?
Here is the function which creates a document from a template:
static void GenerateDocumentFromTemplate(string inputPath, string outputPath)
{
MemoryStream documentStream;
using (Stream stream = File.OpenRead(inputPath))
{
documentStream = new MemoryStream((int)stream.Length);
CopyStream(stream, documentStream);
documentStream.Position = 0L;
}
using (WordprocessingDocument template = WordprocessingDocument.Open(documentStream, true))
{
template.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
MainDocumentPart mainPart = template.MainDocumentPart;
mainPart.DocumentSettingsPart.AddExternalRelationship("http://schemas.openxmlformats.org/officeDocument/2006/relationships/attachedTemplate",
new Uri(inputPath, UriKind.Absolute));
mainPart.Document.Save();
}
File.WriteAllBytes(outputPath, documentStream.ToArray());
}

Using PDFsharp and MigraDoc to write to and then read from a PDF

I'm trying to write verification code for our PDF generating routines, and I'm having difficulty getting PDFsharp to extract text from files created with MigraDoc. The ExtractText code works with other PDFs, but not with the PDFs that I generate with MigraDoc (see code below.)
Any tips on what I'm doing wrong?
//Create the Doc
var doc = new MigraDoc.DocumentObjectModel.Document();
doc.Info.Title = "VerifyReadWrite";
var section = doc.AddSection();
section.AddParagraph("ABCDEF abcdef");
//Render the PDF
var renderer = new PdfDocumentRenderer(true);
var pdf = new PdfDocument();
renderer.PdfDocument = pdf;
renderer.Document = doc;
renderer.RenderDocument();
var msOut = new MemoryStream();
pdf.Save(msOut, true);
var pdfBytes = msOut.ToArray();
//Read the PDF into PdfSharp
var ms = new MemoryStream(pdfBytes);
var pdfRead = PdfSharp.Pdf.IO.PdfReader.Open(ms, PdfDocumentOpenMode.ReadOnly);
var segments = pdfRead.Pages[0].ExtractText().ToList();
Results in the following:
segments[0] = "\0$\0%\0&\0'\0(\0)"
segments[1] = "\0D\0E\0F\0G\0H\0I"
I'd expect to see:
segments[0] = "ABCDEF"
segments[1] = "abcdef"
I'm using the ExtractText code from here:
C# Extract text from PDF using PdfSharp
and it works very well for all but PDFs generated with MigraDoc.
public static IEnumerable<string> ExtractText(this PdfPage page)
{
var content = ContentReader.ReadContent(page);
var text = content.ExtractText();
return text.Select(x => x.Trim());
}
public static IEnumerable<string> ExtractText(this CObject cObject)
{
if (cObject is COperator)
{
var cOperator = (COperator) cObject;
if (cOperator.OpCode.Name == OpCodeName.Tj.ToString() ||
cOperator.OpCode.Name == OpCodeName.TJ.ToString())
{
foreach (var cOperand in cOperator.Operands)
foreach (var txt in ExtractText(cOperand))
yield return txt;
}
}
else
{
var sequence = cObject as CSequence;
if (sequence != null)
{
var cSequence = sequence;
foreach (var element in cSequence)
foreach (var txt in ExtractText(element))
yield return txt;
}
else if (cObject is CString)
{
var cString = (CString) cObject;
yield return cString.Value;
}
}
}
It seems the code used to extract text does not support all cases.
Try new PdfDocumentRenderer(false) (instead of 'true'). AFAIK this will lead to a different encoding and the text extraction might work.

Unable to set content in SdtElement using WordprocessingDocument

I have a template file in which I have placed two place holders. Both are Plain Text Content Controls. I have following code in which I am setting the values to the Place Holders in the file.
static void Main(string[] args)
{
string fileName = "C:\\xxx\\Template.docx";
byte[] fileContent = File.ReadAllBytes(fileName);
using (MemoryStream memStream = new MemoryStream())
{
memStream.Write(fileContent, 0, (int)fileContent.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(memStream,true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
var sdtElements = wordDoc.MainDocumentPart.Document.Descendants<SdtElement>();
foreach (SdtElement sdtElement in sdtElements)
{
Tag blockTag = sdtElement.SdtProperties.Descendants<Tag>().ElementAt(0);
Run nr = new Run();
Text txt = new Text();
txt.Text = "RKS";
nr.Append(txt);
Lock lckContent = new Lock();
bool lockControl = true;
if (lockControl)
{
lckContent.Val = LockingValues.SdtContentLocked;
}
else
{
lckContent.Val = LockingValues.Unlocked;
}
if (sdtElement is SdtBlock)
{
(((SdtBlock)sdtElement).SdtContentBlock.ElementAt(0)).RemoveAllChildren();
(((SdtBlock)sdtElement).SdtContentBlock.ElementAt(0)).AppendChild<Run>(nr);
((SdtBlock)sdtElement).SdtProperties.Append(lckContent);
}
if (sdtElement is SdtCell)
{
((SdtCell)sdtElement).SdtContentCell.ElementAt(0).Descendants<Paragraph>().ElementAt(0).RemoveAllChildren(); ((SdtCell)sdtElement).SdtContentCell.ElementAt(0).Descendants<Paragraph>().ElementAt(0).AppendChild<Run>(nr);
((SdtCell)sdtElement).SdtProperties.Append(lckContent);
}
if (sdtElement is SdtRun)
{
//SdtContentText text = sdtElement.SdtProperties.Elements<SdtContentText>().FirstOrDefault();
//((SdtRun)sdtElement).SdtContentRun.ElementAt(0).AppendChild<Text>(emptyTxt);
((SdtRun)sdtElement).SdtContentRun.ElementAt(0).RemoveAllChildren();
((SdtRun)sdtElement).SdtContentRun.ElementAt(0).AppendChild<Run>(nr);
((SdtRun)sdtElement).SdtProperties.Append(lckContent);
}
}
wordDoc.MainDocumentPart.Document.Save();
}
}
}
The code runs successfully but the changes are not reflected in the file.
What am I missing?
You are creating a WordprocessingDocument from a memory stream, so there is no way for the class to know which file to write to. It writes all changes to the memory stream, not the file.
You can create a WordprocessingDocument directly from a file by calling WordprocessingDocument.Open method and specifying the name of your file (see https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.packaging.wordprocessingdocument.aspx) and then the changes should be reflected in the file.
If you need to load the document into a buffer for some reason, then you need to copy the data from the buffer back to the file manually.
After making some experiments, I could not understand how it worked but it is fine for me. I just save the file with other name.
After the code line: wordDoc.MainDocumentPart.Document.Save(); I added
File.WriteAllBytes("C:\\xxx\\Sample.docx", memStream.ToArray());

Categories