OpenXml to remove watermarks from Word, Excel and Powerpoint - c#

I'm newer using OpenXml, I'm developing a window application in C# and I need to remove the watermark (at run-time) from the selected word, excel or powerpoint file. The watermark has been added manually by the user (don't ask me why he could not remove it manually... it's a customer request...).
I have been created an "empty" file docx (Hello Word! in the body) and with a watermark "DRAFT". I have been implemented an example of simple application to remove it using the code used in this topic (code 1): Removing watermark in word with OpenXml & C# corrupts document
but the application returns a System.ArgumentOutOfRangeException.
The code is the following:
public Form1()
{
InitializeComponent();
string document = #"D:\Work\EsempioFiligrana\doc1.docx";
// Open the file in editable mode.
using (WordprocessingDocument wordprocessingDocument =
WordprocessingDocument.Open(document, true))
{
DeleteCustomWatermark(wordprocessingDocument, "DRAFT");
}
}
private static void DeleteCustomWatermark(WordprocessingDocument package, string watermarkId)
{
MainDocumentPart maindoc = package.MainDocumentPart;
if (maindoc != null)
{
var headers = maindoc.GetPartsOfType<HeaderPart>();
if (headers != null)
{
var head = headers.First(); //we are sure that this header part contains the Watermark with id=watermarkId
var watermark = head.GetPartById(watermarkId); \\ !! This statement generates the exception !!
if (watermark != null)
head.DeletePart(watermark);
}
}
}
What's wrong? What can I do to remove the watermark from the document?
Thanks

Related

Drawings disappear after convert from word to pdf with Interop.Word

I have a word file looks like this, it contains some drawings
Original word file
But when I convert this file to pdf, these drawings disappear Drawing disappear pdf file
Here is my code:
string path2document = physical_path;
string path2pdf = physical_path.Replace(file_type, ".pdf");
var appWord = new Microsoft.Office.Interop.Word.Application();
var wordDocument = appWord.Documents.Open(path2document, ReadOnly: true);
var numberOfPages = wordDocument.ComputeStatistics(WdStatistic.wdStatisticPages, false);
if (System.IO.File.Exists(path2pdf) == false && numberOfPages <= 100)
{ // Use one of methods below
wordDocument.ExportAsFixedFormat(path2pdf, Microsoft.Office.Interop.Word.WdExportFormat.wdExportFormatPDF);
// or
wordDocument.SaveAs2(path2pdf, Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF);
}
if (wordDocument != null)
{
wordDocument.Close(WdSaveOptions.wdDoNotSaveChanges);
}
if (appWord != null)
{
appWord.Quit(WdSaveOptions.wdDoNotSaveChanges);
appWord = null;
}
wordDocument.Close();
appWord.Quit();
I try to use both SaveAs2 and ExportAsFixedFormat, the result is same, no drawings on pdf file.
Appreciate any help on this. Thank you in advance.
P/s: I use Microsoft Office Profession Plus 2013 on server
Interop usage needs an installed microsoft office in the windows server (or the server you deploy the program).
Interop literally run the mso app in background process.

Removing watermark in word with OpenXml & C# corrupts document

I have tried following code block to delete the watermark from the document
Code 1:
private static void DeleteCustomWatermark(WordprocessingDocument package, string watermarkId)
{
MainDocumentPart maindoc = package.MainDocumentPart;
if(maindoc!=null)
{
var headers = maindoc.GetPartsOfType<HeaderPart>();
if(headers!=null)
{
var head = headers.First(); //we are sure that this header part contains the Watermark with id=watermarkId
var watermark = head.GetPartById(watermarkId);
if(watermark!=null)
head.DeletePart(watermark);
}
}
}
Code 2:
public static void DeleteCustomWatermark(WordProcessingDocument package, string headerId)
{
//headerId is the id of the header section which contains the watermark
MainDocumentPart = maindoc = package.MainDocumentPart;
if(maindoc!=null)
{
var header = maindoc.HeaderParts.First(i=>maindoc.GetIdOfPart(i).Equals(headerId));
if(header!=null)
maindoc.DeletePart(header)
}
}
I have tried both the code blocks. it removes watermark but leaves the document corrupted. I need to recover after this. After recovery the docs are fine. But I want proper solution so that I can remove watermark with C# code without leaving the document corrupted. Please help.
Thanks
You also need to remove the "Picture" or "Drawing" in the header parts.
e.g.
List<Picture> pictures = new List<Picture>(headerPart.RootElement.Descendants<Picture>());
...
foreach(Picture p in pictures) {
p.Remove();
}
...
headerPart.DeleteParts(imagePartList);

how to get Styles from existing word document by using Novacode.Docx?

This is the Example code using OpenXML SDK 2.5
void AddStylesPart()
{
StyleDefinitionsPart styleDefinitionsPart = mainPart.StyleDefinitionsPart;
styleDefinitionsPart = mainPart.AddNewPart<StyleDefinitionsPart>();
Styles styles1 = new Styles();
styles1.Save(styleDefinitionsPart);
if (styleDefinitionsPart != null)
{
using (WordprocessingDocument wordTemplate = WordprocessingDocument.Open(#"..\AT\Docs\FPMaster-4DEV.docx", false))
{
foreach (var templateStyle in wordTemplate.MainDocumentPart.StyleDefinitionsPart.Styles)
{
styleDefinitionsPart.Styles.Append(templateStyle.CloneNode(true));
}
}
}
}
Here an existing document is taken using WordprocessingDocument class finally Cloned all the styles present in existing document,
similarly I want to do it using Novacode.Docx DLL. How to get styles used in existing document using Novacode.Docx DLL? kindly please help.
Found an alternative solution, I hope this will help
Using Novacode.Docx DLL we can easily clone the styles used in original document.
It can be done by creating template of the original document.
once If it is done. apply the template in your project.
document.ApplyTemplate(#"..\TemplateFileName.dotx", false);
Now we can able to use all styles present in original document.

How to insert text into a content control with the Open XML SDK

I'm trying to develop a solution which takes the input from a ASP.Net Web Page and Embed the input values into Corresponding Content Controls within a MS Word Document. The MS Word Document has also got Static Data with some Dynamic data to be Embed into the Header and Footer fields.
The Idea here is that the solution should be Web based. Can I use OpenXML for this purpose or any other approach that you can suggest.
Thank you very much in advance for all your valuable inputs. I really appreciate them.
I have a little code sample from my project, to insert a few words in a content control you've created in a Word document:
public static WordprocessingDocument InsertText(this WordprocessingDocument doc, string contentControlTag, string text)
{
SdtElement element = doc.MainDocumentPart.Document.Body.Descendants<SdtElement>()
.FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>()?.Val == contentControlTag);
if (element == null)
throw new ArgumentException($"ContentControlTag \"{contentControlTag}\" doesn't exist.");
element.Descendants<Text>().First().Text = text;
element.Descendants<Text>().Skip(1).ToList().ForEach(t => t.Remove());
return doc;
}
It simply looks for the first contentcontrol in the document with a specific Tag (you can set that by enabling designer mode in word and right-clicking on the content control), and replaces the current text with the text passed into the method. After this the document will still contain the content controls of course which may not be desired. So when I'm done editing the document I run the following method to get rid of the content controls:
internal static WordprocessingDocument RemoveSdtBlocks(this WordprocessingDocument doc, IEnumerable<string> contentBlocks)
{
List<SdtElement> SdtBlocks = doc.MainDocumentPart.Document.Descendants<SdtElement>().ToList();
if (contentBlocks == null)
return doc;
foreach(var s in contentBlocks)
{
SdtElement currentElement = SdtBlocks.FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>()?.Val == s);
if (currentElement == null)
continue;
IEnumerable<OpenXmlElement> elements = null;
if (currentElement is SdtBlock)
elements = (currentElement as SdtBlock).SdtContentBlock.Elements();
else if (currentElement is SdtCell)
elements = (currentElement as SdtCell).SdtContentCell.Elements();
else if (currentElement is SdtRun)
elements = (currentElement as SdtRun).SdtContentRun.Elements();
foreach (var el in elements)
currentElement.InsertBeforeSelf(el.CloneNode(true));
currentElement.Remove();
}
return doc;
}
To open the WordProcessingDocument from a template and edit it, there is plenty of information available online.
Edit:
Little sample code to open/save documents while working with them in a memorystream, of course you should take care of this with an extra repository class that takes care of managing the document in the real code:
byte[] byteArray = File.ReadAllBytes(#"C:\...\Template.dotx");
using (var stream = new MemoryStream())
{
stream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
{
//Needed because I'm working with template dotx file,
//remove this if the template is a normal docx.
doc.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
doc.InsertText("contentControlName","testtesttesttest");
}
using (FileStream fs = new FileStream(#"C:\...\newFile.docx", FileMode.Create))
{
stream.WriteTo(fs);
}
}

Chinese characters instead of text into the metadata called "Producer"

I have a problem when I edit the metadata of a pdf with iTextSharp.
I save a word document in pdf with Word. The field called "Producer" is filled by word with the text "Microsoft Word 210". After, I edit the metadata with ITextSharp and iTextSharp tries to edit this field in order to add the text "modified using iTextSharp 4.1.6".
The result is Producer(þÿMicrosoft® Word 2010; modified using iTextSharp 4.1.6 by 1T3XT). In adobe reader, the field PDF Producer in document properties shows chinese characters.
Adobe can read the field if I remove manually the characters þÿ.
Do you know why I have this problem ?
What can I do to solve this problem ?
Just for reference, this works with iText 2.1.7. It is Java code, but probably works too for C#.
import java.io.File;
import java.io.FileOutputStream;
import org.junit.Test;
import com.lowagie.text.pdf.PdfDictionary;
import com.lowagie.text.pdf.PdfName;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfStamper;
import com.lowagie.text.pdf.PdfString;
public class AppTest {
#Test
public void testApp() throws Exception {
PdfReader reader = new PdfReader(AppTest.class.getResourceAsStream("/msword2010.pdf"));
FileOutputStream fos = new FileOutputStream(new File("target", "modified_msword2010.pdf"));
PdfStamper stamper = new PdfStamper(reader, fos, '\0', true);
PdfDictionary infoDict = stamper.getReader().getTrailer().getAsDict(PdfName.INFO);
String producerCleaned = null;
if (infoDict != null) {
PdfString producer = (PdfString) infoDict.get(PdfName.PRODUCER);
if (producer != null) {
producerCleaned = producer.toUnicodeString();
PdfString cleanStrObj = new PdfString(producerCleaned);
infoDict.put(PdfName.PRODUCER, cleanStrObj);
}
}
stamper.close();
}
}

Categories