Word can't open document after adding CustomXmlPart - c#

I'm trying to add a customxmlPart to a docm file without success.
Apparently the file is too big (more than 10mb) to be included in the package.
If the xml file size is less than 7 mb the document can be opened successfully.
Any ideas ?
Thank you for your help.
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open("doc.docm", true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
if (wordDoc.MainDocumentPart.CustomXmlParts != null)
{
wordDoc.MainDocumentPart.DeleteParts<CustomXmlPart>(wordDoc.MainDocumentPart.CustomXmlParts);
}
CustomXmlPart myXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (FileStream stream = new FileStream("10mbfile.xml", FileMode.Open))
{
myXmlPart.FeedData(stream);
}
wordDoc.Package.Flush();
}
EDIT : I found the issue, the xml file contains lot of carriage return + line feed. After removing them i can embed the file as a CustomXmlPart.

The following unit test demonstrates that you can add very large custom XML parts (up to 30MB in the example) to a Word document:
using System;
using System.IO;
using System.Linq;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using Xunit;
namespace CodeSnippets.Tests.OpenXml.Wordprocessing
{
public class LargeCustomXmlPartsTests
{
public static readonly XNamespace W = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
[Theory]
[InlineData(5)]
[InlineData(10)]
[InlineData(15)]
[InlineData(20)]
[InlineData(30)]
public void CanCreateLargeCustomXmlParts(int size)
{
int desiredStreamLength = size * 1024 * 1024;
string path = $"Document_{size:D2}MB.docm";
// Create a macro-enabled Word document with a custom XML part having
// at least the desired size in MB.
CreateMacroEnabledWordDocument(path, size);
// Assert that the document does have a custom XML part with at least
// the desired size.
using WordprocessingDocument wordDocument = WordprocessingDocument.Open(path, false);
CustomXmlPart customXmlPart = wordDocument.MainDocumentPart.CustomXmlParts.First();
using Stream stream = customXmlPart.GetStream(FileMode.Open, FileAccess.Read);
Assert.True(stream.Length > desiredStreamLength);
}
private static void CreateMacroEnabledWordDocument(string path, int size)
{
const WordprocessingDocumentType type = WordprocessingDocumentType.MacroEnabledDocument;
using WordprocessingDocument wordDocument = WordprocessingDocument.Create(path, type);
// Create a main document part with an empty document.
MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart();
WriteRootElement(mainDocumentPart,
new XElement(W + "document",
new XElement(W + "body",
new XElement(W + "p"))));
// Create a custom XML part with the desired size in MB.
CustomXmlPart customXmlPart = mainDocumentPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
WriteRootElement(customXmlPart, CreatePartRootElement(size));
}
private static void WriteRootElement(OpenXmlPart part, XElement partRootElement)
{
using Stream stream = part.GetStream(FileMode.Create, FileAccess.Write);
using XmlWriter writer = XmlWriter.Create(stream);
partRootElement.WriteTo(writer);
}
private static XElement CreatePartRootElement(int size)
{
var random = new Random();
return new XElement("root",
Enumerable.Range(0, size).Select(paragraphIndex =>
new XElement("p",
new XAttribute("i", paragraphIndex),
Enumerable.Range(0, 1000).Select(runIndex =>
new XElement("r",
new XAttribute("i", runIndex),
new XElement("t", CreateRandomString(random)))))));
}
private static string CreateRandomString(Random random)
{
char[] value = Enumerable
.Range(0, 930)
.Select(i => Convert.ToChar(random.Next(33, 125)))
.ToArray();
return new string(value);
}
}
}
On my Windows 10 notebook, Microsoft Word for Office 365 opens the document with the 30MB custom XML part without any problems. Therefore, I'd say, your problem must be caused by other factors or a combination of factors, including any processing of the custom XML part performed by the VSTO add-in that was mentioned in a comment.

I found the issue, the xml file contains lot of carriage return + line feed. After removing them i can embed the file as a CustomXmlPart.

Related

Trying To Extract Embedded File Attachments From Existing PDF Using C# .NET And PDFBox 1.7.0

I am trying to extract embedded file attachments from an existing PDF using C# .NET and PDFBox.
The following is my code:
using System.Collections.Generic;
using System.IO;
using java.util; // IKVM Java for Microsoft .NET http://www.ikvm.net
using java.io; // IKVM Java for Microsoft .NET http://www.ikvm.net
using org.apache.pdfbox.pdmodel; // PDFBox 1.7.0 http://pdfbox.apache.org
using org.apache.pdfbox.pdmodel.common; // PDFBox 1.7.0 http://pdfbox.apache.org
using org.apache.pdfbox.pdmodel.common.filespecification; // PDFBox 1.7.0 http://pdfbox.apache.org
using org.apache.pdfbox.cos; // PDFBox 1.7.0 http://pdfbox.apache.org
namespace PDFClass
{
public class Class1
{
public Class1 ()
{
}
public void ReadPDFAttachments (string existingFileNameFullPath)
{
PDEmbeddedFilesNameTreeNode efTree;
PDComplexFileSpecification fs;
FileStream stream;
ByteArrayInputStream fakeFile;
PDDocument pdfDocument = new PDDocument();
PDEmbeddedFile ef;
PDDocumentNameDictionary names;
Map efMap = new HashMap();
pdfDocument = PDDocument.load(existingFileNameFullPath);
PDDocumentNameDictionary namesDictionary = new PDDocumentNameDictionary(pdfDocument.getDocumentCatalog());
PDEmbeddedFilesNameTreeNode embeddedFiles = namesDictionary.getEmbeddedFiles(); // some bug is currently preventing this call from working! >:[
if (embeddedFiles != null)
{
var aKids = embeddedFiles.getKids().toArray();
List<PDNameTreeNode> kids = new List<PDNameTreeNode>();
foreach (object oKid in aKids)
{
kids.Add(oKid as PDNameTreeNode);
}
if (kids != null)
{
foreach (PDNameTreeNode kid in kids)
{
PDComplexFileSpecification spec = (PDComplexFileSpecification)kid.getValue("ZUGFERD_XML_FILENAME");
PDEmbeddedFile file = spec.getEmbeddedFile();
fs = new PDComplexFileSpecification();
// Loop through each file for re-embedding
byte[] data = file.getByteArray();
int read = data.Length;
fakeFile = new ByteArrayInputStream(data);
ef = new PDEmbeddedFile(pdfDocument, fakeFile);
fs.setEmbeddedFile(ef);
efMap.put(kid.toString(), fs);
embeddedFiles.setNames(efMap);
names = new PDDocumentNameDictionary(pdfDocument.getDocumentCatalog());
((COSDictionary)efTree.getCOSObject()).removeItem(COSName.LIMITS); // Bug in PDFBox code requires we do this, or attachment will not embed. >:[
names.setEmbeddedFiles(embeddedFiles);
pdfDocument.getDocumentCatalog().setNames(names);
fs.getCOSDictionary().setString("Desc", kid.toString()); // adds a description to attachment in PDF attachment list
}
}
}
}
}
}
The variable embeddedFiles is always null. even though I put a break in the code and can see the PDF file clearly has the attachment in it.
Any assistance would be greatly appreciated!

Get Data From .docx file like one big String in C#

I want to read data - like string, from .docx file from C# code. I look through some of the issues but didn't understand which one to use.
I'm trying to use ApplicationClass Application = new ApplicationClass(); but I get t
Error:
The type 'Microsoft.Office.Interop.Word.ApplicationClass' has no
constructors defined
And I want to get full text from my docx file, NOT SEPARATED WORDS !
foreach (FileInfo f in docFiles)
{
Application wo = new Application();
object nullobj = Missing.Value;
object file = f.FullName;
Document doc = wo.Documents.Open(ref file, .... . . ref nullobj);
doc.Activate();
doc. == ??
}
I want to know how can I get whole text from docx file?
This Is what I want to extract whole text from docx file !
using (ZipFile zip = ZipFile.Read(filename))
{
MemoryStream stream = new MemoryStream();
zip.Extract(#"word/document.xml", stream);
stream.Seek(0, SeekOrigin.Begin);
XmlDocument xmldoc = new XmlDocument();
xmldoc.Load(stream);
string PlainTextContent = xmldoc.DocumentElement.InnerText;
}
try
Word.Application interface instead of ApplicationClass.
Understanding Office Primary Interop Assembly Classes and Interfaces
The .docx format as the other Microsoft Office files that end with "x" is simply a ZIP package that you can open/modify/compress.
So use an Office Open XML library like this.
Enjoy.
Make sure you are using .Net Framework 4.5.
using NUnit.Framework;
[TestFixture]
public class GetDocxInnerTextTestFixture
{
private string _inputFilepath = #"../../TestFixtures/TestFiles/input.docx";
[Test]
public void GetDocxInnerText()
{
string documentText = DocxInnerTextReader.GetDocxInnerText(_inputFilepath);
Assert.IsNotNull(documentText);
Assert.IsTrue(documentText.Length > 0);
}
}
using System.IO;
using System.IO.Compression;
using System.Xml;
public static class DocxInnerTextReader
{
public static string GetDocxInnerText(string docxFilepath)
{
string folder = Path.GetDirectoryName(docxFilepath);
string extractionFolder = folder + "\\extraction";
if (Directory.Exists(extractionFolder))
Directory.Delete(extractionFolder, true);
ZipFile.ExtractToDirectory(docxFilepath, extractionFolder);
string xmlFilepath = extractionFolder + "\\word\\document.xml";
var xmldoc = new XmlDocument();
xmldoc.Load(xmlFilepath);
return xmldoc.DocumentElement.InnerText;
}
}
First you need to add some references from assemblies such as:
System.Xml
System.IO.Compression.FileSystem
Second you should be certain of calling these using in your class:
using System.IO;
using System.IO.Compression;
using System.Xml;
Then you can use below code:
public string DocxToString(string docxPath)
{
// Destination of your extraction directory
string extractDir = Path.GetDirectoryName(docxPath) + "\\" + Path.GetFileName(docxPath) + ".tmp";
// Delete old extraction directory
if (Directory.Exists(extractDir)) Directory.Delete(extractDir, true);
// Extract all of media an xml document in your destination directory
ZipFile.ExtractToDirectory(docxPath, extractDir);
XmlDocument xmldoc = new XmlDocument();
// Load XML file contains all of your document text from the extracted XML file
xmldoc.Load(extractDir + "\\word\\document.xml");
// Delete extraction directory
Directory.Delete(extractDir, true);
// Read all text of your document from the XML
return xmldoc.DocumentElement.InnerText;
}
Enjoy...

Why does my custom XML not carry over to a new version of a DOCX file when Word saves it?

I'm adding in some custom XML to a docx for tracking it inside an application I'm writing.
I've manually done it via opening the Word Document via a ZIP library, and via the official Open XML SDK route. Both have the same outcome of my XML being inserted into customXml folder in the document. The document opens fine in Word for both of these methods, and the XML is present.
BUT when I then save the document as MyDoc2.docx for example all my XML disappears.
What am I doing wrong?
Microsoft links I've been following:
http://msdn.microsoft.com/en-us/library/bb608597.aspx
http://msdn.microsoft.com/en-us/library/bb608612.aspx
And the code I've taken from the Open XML SDK 2.0:
public static void AddNewPart(string document, string fileName)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
CustomXmlPart myXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (FileStream stream = new FileStream(fileName, FileMode.Open))
{
myXmlPart.FeedData(stream);
}
}
}
Thanks,
John
Ok,so I managed to find the following article Using Custom XML Part as DataStore on openxmldeveloper.org, and have stripped out the unnecessary code so that it inserts and retains custom XML:
static void Main(string[] args)
{
using (WordprocessingDocument doc = WordprocessingDocument.Open("Test.docx", true, new OpenSettings()))
{
int customXmlPartsCount = doc.MainDocumentPart.GetPartsCountOfType<CustomXmlPart>();
if (customXmlPartsCount == 0)
{
CustomXmlPart customXmlPersonDataSourcePart = doc.MainDocumentPart.AddNewPart<CustomXmlPart>("application/xml", null);
using (FileStream stream = new FileStream("Test.xml", FileMode.Open))
{
customXmlPersonDataSourcePart.FeedData(stream);
}
CustomXmlPropertiesPart customXmlPersonPropertiesDataSourcePart = customXmlPersonDataSourcePart
.AddNewPart<CustomXmlPropertiesPart>("Rd3c4172d526e4b2384ade4b889302c76");
Ds.DataStoreItem dataStoreItem1 = new Ds.DataStoreItem() { ItemId = "{88e81a45-98c0-4d79-952a-e8203ce59aac}" };
customXmlPersonPropertiesDataSourcePart.DataStoreItem = dataStoreItem1;
}
}
}
So all the examples from Microsoft work as long as you don't modify the file. The problem appears to be because we don't setup the relationship with the Main Document.

Manipulating Word 2007 Document XML in C#

I am trying to manipulate the XML of a Word 2007 document in C#. I have managed to find and manipulate the node that I want but now I can't seem to figure out how to save it back. Here is what I am trying:
// Open the document from memoryStream
Package pkgFile = Package.Open(memoryStream, FileMode.Open, FileAccess.ReadWrite);
PackageRelationshipCollection pkgrcOfficeDocument = pkgFile.GetRelationshipsByType(strRelRoot);
foreach (PackageRelationship pkgr in pkgrcOfficeDocument)
{
if (pkgr.SourceUri.OriginalString == "/")
{
Uri uriData = new Uri("/word/document.xml", UriKind.Relative);
PackagePart pkgprtData = pkgFile.GetPart(uriData);
XmlDocument doc = new XmlDocument();
doc.Load(pkgprtData.GetStream());
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("w", nsUri);
XmlNodeList nodes = doc.SelectNodes("//w:body/w:p/w:r/w:t", nsManager);
foreach (XmlNode node in nodes)
{
if (node.InnerText == "{{TextToChange}}")
{
node.InnerText = "success";
}
}
if (pkgFile.PartExists(uriData))
{
// Delete template "/customXML/item1.xml" part
pkgFile.DeletePart(uriData);
}
PackagePart newPkgprtData = pkgFile.CreatePart(uriData, "application/xml");
StreamWriter partWrtr = new StreamWriter(newPkgprtData.GetStream(FileMode.Create, FileAccess.Write));
doc.Save(partWrtr);
partWrtr.Close();
}
}
pkgFile.Close();
I get the error 'Memory stream is not expandable'. Any ideas?
I would recommend that you use Open XML SDK instead of hacking the format by yourself.
Using OpenXML SDK 2.0, I do this:
public void SearchAndReplace(Dictionary<string, string> tokens)
{
using (WordprocessingDocument doc = WordprocessingDocument.Open(_filename, true))
ProcessDocument(doc, tokens);
}
private string GetPartAsString(OpenXmlPart part)
{
string text = String.Empty;
using (StreamReader sr = new StreamReader(part.GetStream()))
{
text = sr.ReadToEnd();
}
return text;
}
private void SavePart(OpenXmlPart part, string text)
{
using (StreamWriter sw = new StreamWriter(part.GetStream(FileMode.Create)))
{
sw.Write(text);
}
}
private void ProcessDocument(WordprocessingDocument doc, Dictionary<string, string> tokenDict)
{
ProcessPart(doc.MainDocumentPart, tokenDict);
foreach (var part in doc.MainDocumentPart.HeaderParts)
{
ProcessPart(part, tokenDict);
}
foreach (var part in doc.MainDocumentPart.FooterParts)
{
ProcessPart(part, tokenDict);
}
}
private void ProcessPart(OpenXmlPart part, Dictionary<string, string> tokenDict)
{
string docText = GetPartAsString(part);
foreach (var keyval in tokenDict)
{
Regex expr = new Regex(_starttag + keyval.Key + _endtag);
docText = expr.Replace(docText, keyval.Value);
}
SavePart(part, docText);
}
From this you could write a GetPartAsXmlDocument, do what you want with it, and then stream it back with SavePart(part, xmlString).
Hope this helps!
You should use the OpenXML SDK to work on docx files and not write your own wrapper.
Getting Started with the Open XML SDK 2.0 for Microsoft Office
Introducing the Office (2007) Open XML File Formats
How to: Manipulate Office Open XML Formats Documents
Manipulate Docx with C# without Microsoft Word installed with OpenXML SDK
The problem appears to be doc.Save(partWrtr), which is built using newPkgprtData, which is built using pkgFile, which loads from a memory stream... Because you loaded from a memory stream it's trying to save the document back to that same memory stream. This leads to the error you are seeing.
Instead of saving it to the memory stream try saving it to a new file or to a new memory stream.
The short and simple answer to the issue with getting 'Memory stream is not expandable' is:
Do not open the document from memoryStream.
So in that respect the earlier answer is correct, simply open a file instead.
Opening from MemoryStream editing the document (in my experience) easy lead to 'Memory stream is not expandable'.
I suppose the message appears when one do edits that requires the memory stream to expand.
I have found that I can do some edits but not anything that add to the size.
So, f.ex deleting a custom xml part is ok but adding one and some data is not.
So if you actually need to open a memory stream you must figure out how to open an expandable MemoryStream if you want to add to it.
I have a need for this and hope to find a solution.
Stein-Tore Erdal
PS: just noticed the answer from "Jan 26 '11 at 15:18".
Don't think that is the answer in all situations.
I get the error when trying this:
var ms = new MemoryStream(bytes);
using (WordprocessingDocument wd = WordprocessingDocument.Open(ms, true))
{
...
using (MemoryStream msData = new MemoryStream())
{
xdoc.Save(msData);
msData.Position = 0;
ourCxp.FeedData(msData); // Memory stream is not expandable.

Need to create a PDF file from C# with another PDF file as the background watermark

I am looking for a solution that will allow me to create a PDF outfile from C# that also merges in a seperate, static PDF file as the background watermark.
I'm working on a system that will allow users to create a PDF version of their invoice. Instead of trying to recreate all of the invoice features within C# I think the easiest solution would be to use the PDF version fo the blank invoice (created from Adobe Illustrator) as a background watermark and simply overlay the dynamic invoice details on top.
I was looking at Active Reports from Data Dynamics, but it dow not appear they they have the capibility to overlay, or merge, a report onto an existing PDF file.
Is there any other .NET PDF report product that has this capibilty?
Thank you bhavinp. iText seems to do the trick and work exactly as I was hoping for.
For anyone else trying to merge to PDF files and overlay them the following example code based on using the the iTextPDF library might help.
The Result file is a combination of Original and Background
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace iTextTest
{
class Program
{
/** The original PDF file. */
const String Original = #"C:\Jobs\InvoiceSource.pdf";
const String Background = #"C:\Jobs\InvoiceTemplate.pdf";
const String Result = #"C:\Jobs\InvoiceOutput.pdf";
static void Main(string[] args)
{
ManipulatePdf(Original, Background, Result);
}
static void ManipulatePdf(String src, String stationery, String dest)
{
// Create readers
PdfReader reader = new PdfReader(src);
PdfReader sReader = new PdfReader(stationery);
// Create the stamper
PdfStamper stamper = new PdfStamper(reader, new FileStream(dest, FileMode.Create));
// Add the stationery to each page
PdfImportedPage page = stamper.GetImportedPage(sReader, 1);
int n = reader.NumberOfPages;
PdfContentByte background;
for (int i = 1; i <= n; i++)
{
background = stamper.GetUnderContent(i);
background.AddTemplate(page, 0, 0);
}
// CLose the stamper
stamper.Close();
}
}
}
I came across this question and couldn't use the iTextSharp library due to the license on the free version
The iText AGPL license is for developers who wish to share their entire application source code with the open-source community as free software under the AGPL “copyleft” terms.
However I found PDFSharp to work using the below code.
using System;
using System.Collections.Generic;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using PdfSharp.Drawing;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
namespace PDFTest
{
class Program
{
static Stream Main(string[] args)
{
using (PdfDocument originalDocument= PdfReader.Open("C:\\MainDocument.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument outputPdf = new PdfDocument())
{
foreach (PdfPage page in originalDocument.Pages)
{
outputPdf.AddPage(page);
}
var background = XImage.FromFile("C:\\Watermark.pdf");
foreach (PdfPage page in outputPdf.Pages)
{
XGraphics graphics = XGraphics.FromPdfPage(page);
graphics.DrawImage(background, 1, 1);
}
MemoryStream stream = new MemoryStream();
outputPdf.Save("C:\\OutputFile.pdf");
}
}
}
}
Use this.
http://itextpdf.com/
It works with both Java and .NET

Categories