Descendants<T> gets zero elements in Word doc - c#

I am having trouble updating a Hyperlink in a Word doc (Q How to update the body and a hyperlink in a Word doc ) and am zooming in on the Descendants<T>() call not working. Here is my code:
using DocumentFormat.OpenXml.Packaging; //from NuGet ClosedXML
using DocumentFormat.OpenXml.Wordprocessing; //from NuGet ClosedXML
WordprocessingDocument doc = WordprocessingDocument.Open(...filename..., true);
MainDocumentPart mainPart = doc.MainDocumentPart;
IEnumerable<Hyperlink> hLinks = mainPart.Document.Body.Descendants<Hyperlink>();
The doc is opened OK because mainPart gets a value. But hLinks has no elements. If I open the Word doc in Word, a hyperlink is present and working.
In the Immediate Window I see the following values:
mainPart.Document.Body
-->
{DocumentFormat.OpenXml.Wordprocessing.Body}
ChildElements: {DocumentFormat.OpenXml.OpenXmlChildElements}
ExtendedAttributes: {DocumentFormat.OpenXml.EmptyEnumerable<DocumentFormat.OpenXml.OpenXmlAttribute>}
FirstChild: {DocumentFormat.OpenXml.OpenXmlUnknownElement}
HasAttributes: false
HasChildren: true
InnerText: "
lots of data, e.g:
...<w:t>100</w:t>...
mainPart.Document.Body.Descendants<Text>().First()
-->
Exception: "Sequence contains no elements"
If I cannot even find the text parts, how should I ever find and replace the hyperlink?

If you are sure there are elements in your file that you are searching with linq, and nothing is returning or you are getting exceptions, that typically points to a namespace problem.
If you post your entire file, I can better help you, but check to see if you can alias your namespace like so:
using W = DocumentFormat.OpenXml.Wordprocessing;
and then in your Descendants call you do something like this:
var hLinks = mainPart.Document.Body.Descendants<W.Hyperlink>();
This answer demonstrates another namespace trick to try also.

Something seems to be wrong with my Word doc; it was generated with a tool. Testing with another Word doc, created with Word, gives better results. I am working on it ...
With a regular Word doc, looking at
doc.MainDocumentPart.Document.Body.InnerXml
the value starts with:
<w:p w:rsidR=\"00455325\" w:rsidRDefault=\"00341915\"
xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\">
<w:r>
<w:t>Hello World!
but with the word doc I am testing with, which comes from a tool I myself made:
<w:body xmlns:w=\"http://schemas.openxmlforma...
This explains a lot.
I will have to fix my tool :-)
Update:
The fix was that this did not give the correct part of data to insert in the Word Doc:
string strDocumentXml = newWordContent.DocumentElement.InnerXml;
but instead this is the correct data:
string strDocumentXml = newWordContent.DocumentElement.FirstChild.OuterXml;
Inspection with the debugger of:
doc.MainDocumentPart.Document.Body.InnerXml
as mentioned above, confirmed it. The Descendants call now returns the expected data, and updating the hyperlink works.
Side note:
I clearly fixed a bug in my app, but, apart from updating the hyperlink, the app worked perfectly OK before, with that bug :-)

Related

How to update the body and a hyperlink in a Word doc [duplicate]

I am having trouble updating a Hyperlink in a Word doc (Q How to update the body and a hyperlink in a Word doc ) and am zooming in on the Descendants<T>() call not working. Here is my code:
using DocumentFormat.OpenXml.Packaging; //from NuGet ClosedXML
using DocumentFormat.OpenXml.Wordprocessing; //from NuGet ClosedXML
WordprocessingDocument doc = WordprocessingDocument.Open(...filename..., true);
MainDocumentPart mainPart = doc.MainDocumentPart;
IEnumerable<Hyperlink> hLinks = mainPart.Document.Body.Descendants<Hyperlink>();
The doc is opened OK because mainPart gets a value. But hLinks has no elements. If I open the Word doc in Word, a hyperlink is present and working.
In the Immediate Window I see the following values:
mainPart.Document.Body
-->
{DocumentFormat.OpenXml.Wordprocessing.Body}
ChildElements: {DocumentFormat.OpenXml.OpenXmlChildElements}
ExtendedAttributes: {DocumentFormat.OpenXml.EmptyEnumerable<DocumentFormat.OpenXml.OpenXmlAttribute>}
FirstChild: {DocumentFormat.OpenXml.OpenXmlUnknownElement}
HasAttributes: false
HasChildren: true
InnerText: "
lots of data, e.g:
...<w:t>100</w:t>...
mainPart.Document.Body.Descendants<Text>().First()
-->
Exception: "Sequence contains no elements"
If I cannot even find the text parts, how should I ever find and replace the hyperlink?
If you are sure there are elements in your file that you are searching with linq, and nothing is returning or you are getting exceptions, that typically points to a namespace problem.
If you post your entire file, I can better help you, but check to see if you can alias your namespace like so:
using W = DocumentFormat.OpenXml.Wordprocessing;
and then in your Descendants call you do something like this:
var hLinks = mainPart.Document.Body.Descendants<W.Hyperlink>();
This answer demonstrates another namespace trick to try also.
Something seems to be wrong with my Word doc; it was generated with a tool. Testing with another Word doc, created with Word, gives better results. I am working on it ...
With a regular Word doc, looking at
doc.MainDocumentPart.Document.Body.InnerXml
the value starts with:
<w:p w:rsidR=\"00455325\" w:rsidRDefault=\"00341915\"
xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\">
<w:r>
<w:t>Hello World!
but with the word doc I am testing with, which comes from a tool I myself made:
<w:body xmlns:w=\"http://schemas.openxmlforma...
This explains a lot.
I will have to fix my tool :-)
Update:
The fix was that this did not give the correct part of data to insert in the Word Doc:
string strDocumentXml = newWordContent.DocumentElement.InnerXml;
but instead this is the correct data:
string strDocumentXml = newWordContent.DocumentElement.FirstChild.OuterXml;
Inspection with the debugger of:
doc.MainDocumentPart.Document.Body.InnerXml
as mentioned above, confirmed it. The Descendants call now returns the expected data, and updating the hyperlink works.
Side note:
I clearly fixed a bug in my app, but, apart from updating the hyperlink, the app worked perfectly OK before, with that bug :-)

get height and width of docx file without using office

I get count of pages in next code:
using DocumentFormat.OpenXml.Packaging;
WordprocessingDocument doc = WordprocessingDocument.Open( #"D:\2pages.docx", false );
Console.WriteLine(
doc.ExtendedFilePropertiesPart.Properties.Pages.InnerText.ToString()
);
can I get in this way height and width of file?
or in another way but without using office.
Aspose.Word (which is not free) has these things built in:
https://docs.aspose.com/display/wordsnet/Changing+Page+Setup+for+Whole+Document+using+Aspose.Words
Document doc = new Document();
// This truly makes the document empty. No sections (not possible in Microsoft Word).
doc.RemoveAllChildren();
// Create a new section node.
// Note that the section has not yet been added to the document,
// but we have to specify the parent document.
Section section = new Section(doc);
// Append the section to the document.
doc.AppendChild(section);
// Lets set some properties for the section.
section.PageSetup.SectionStart = SectionStart.NewPage;
section.PageSetup.PaperSize = PaperSize.Letter;
Someone had a similar problem and this is the discussion on SO (but with OpenXML):
Change Page size of Wor Document using Open Xml SDK 2.0
Maybe you can deduct your answer from this.
Please use PageSetup.PageHeight and PageSetup.PageWidth properties to get the page's height and width. Hope this helps you.
Document doc = new Document(MyDir + "input.docx");
Console.WriteLine(doc.FirstSection.PageSetup.PageHeight);
Console.WriteLine(doc.FirstSection.PageSetup.PageWidth);
I work with Aspose as Developer Evangelist.

System.Exception: Internal error 20, found element in OpenXmlPowerTools.RevisionAccepter.AcceptRevisions

I want to Accept all the track changes from word document. I have written following codes to do so.(I am using PowerTools from codeplex.)
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(filePath, true))
{
OpenXmlPowerTools.RevisionAccepter.AcceptRevisions(wordDoc);
}
But above code is not working in some of the document. It shows System.Exception: Internal error 20, found element exception in some of the document.
So is there any issue with my word document? If yes then what should I look into document? In short I want to know that what is wrong with my document so that I can correct my document to run above code.
Another thing is that I am able to accept tracking changes in Word 2013/2010/2007 itself!!
Any help would be highly appreciated,
I have asked same question on https://powertools.codeplex.com/discussions and they have updated RevisionAccepter.cs file to get this resolved.
Thread from Power Tools Discussion

One or more XML expansion packs are available for this file

UPDATED:
I have now added the following code based on the answers below:
foreach (Word.XMLSchemaReference reference in Globals.ThisDocument.Application.ActiveDocument.XMLSchemaReferences)
{
if (reference.NamespaceURI.Contains("ActionsPane"))
{
reference.Delete();
}
}
This gives me no errors at design, time, but still gives the user the message described in the original question about choosing an xml expansion pack. So the original problem hasn't been solved.
ORIGINAL QUESTION:
Using Visual Studio 2013, I have created a Word Document level project which has an action pane. Everything works well. The only problem is what when someone uses this documents action pane to insert text into the document and then save it. The next time that saved document is opened, the user gets the following message
One or more XML expansion packs are available for this file.
Choose one from the list below.
No XML expansion pack
Microsoft Actions Pane 3
How do I stop this from happening when saved documents are opened?
You need to check the XMLSchemaReferences of the word document to see if any of the Xml schema's has a namespace referring to the action pane and if so, delete it.
This needs to be done before saving.
The message you get when opening the document is because it contains a schema reference to the action pane namespace.
Something like this :
foreach (XMLSchemaReference reference in wordDocument.XMLSchemaReferences)
{
if (reference.NamespaceURI.Contains("ActionsPane"))
{
reference.Delete();
}
}
where wordDocument is the actual word document you create.
If you don't have a reference to the word document and you just want to use the current document that has the focus, you can use Globals.ThisAddIn.Application.ActiveDocument instead of wordDocument in the code.
I've solved my problem using Huron answer. Thanks Huron.
Remove your xmlreference on your active document.
for my case, i remove my xmlreference on the after mail merge event
void ThisApplication_MailMergeAfterMerge(Word.Document Doc, Word.Document DocResult)
{
DocResult.Fields.Update();
// remove customization
Office.DocumentProperties properties = (Office.DocumentProperties) DocResult.CustomDocumentProperties;
properties["_AssemblyName"].Delete();
properties["_AssemblyLocation"].Delete();
DocResult.RemoveDocumentInformation(Word.WdRemoveDocInfoType.wdRDIDocumentProperties);
foreach (XMLSchemaReference reference in DocResult.XMLSchemaReferences)
{
if (reference.NamespaceURI.Contains("ActionsPane"))
{
reference.Delete();
}
}
ThisApplication.Visible = true;
ThisApplication.NormalTemplate.Saved = true;
Doc.MailMerge.DataSource.Close();
}

YASR - Yet another search and replace question

Environment: asp.net c# openxml
Ok, so I've been reading a ton of snippets and trying to recreate the wheel, but I'm hoping that somone can help me get to my desination faster. I have multiple documents that I need to merge together... check... I'm able to do that with openxml sdk. Birds are singing, sun is shining so far. Now that I have the document the way I want it, I need to search and replace text and/or content controls.
I've tried using my own text - {replace this} but when I look at the xml (rename docx to zip and view the file), the { is nowhere near the text. So I either need to know how to protect that within the doucment so they don't diverge or I need to find another way to search and replace.
I'm able to search/replace if it is an xml file, but then I'm back to not being able to combine the doucments easily.
Code below... and as I mentioned... document merge works fine... just need to replace stuff.
* Update * changed my replace call to go after the tag instead of regex. I have the right info now, but the .Replace call doesn't seem to want to work. Last four lines are for validation that I was seeing the right tag contents. I simply want to replace those contents now.
protected void exeProcessTheDoc(object sender, EventArgs e)
{
string doc1 = Server.MapPath("~/Templates/doc1.docx");
string doc2 = Server.MapPath("~/Templates/doc2.docx");
string final_doc = Server.MapPath("~/Templates/extFinal.docx");
File.Delete(final_doc);
File.Copy(doc1, final_doc);
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(final_doc, true))
{
string altChunkId = "AltChunkId2";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(doc2, FileMode.Open))
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
exeSearchReplace(final_doc);
}
public static void GetPropertyFromDocument(string document, string outdoc)
{
XmlDocument xmlProperties = new XmlDocument();
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, false))
{
ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;
xmlProperties.Load(appPart.GetStream());
}
XmlNodeList chars = xmlProperties.GetElementsByTagName("Company");
chars.Item(0).InnerText.Replace("{ClientName}", "Penn Inc.");
StreamWriter sw;
sw = File.CreateText(outdoc);
sw.WriteLine(chars.Item(0).InnerText);
sw.Close();
}
}
}
If I'm reading this right, you have something like "{replace me}" in a .docx and then when you loop through the XML, you're finding things like <t>{replace</t><t> me</><t>}</t> or some such havoc. Now, with XML like that, it's impossible to create a routine that will replace "{replace me}".
If that's the case, then it's very, very likely related to the fact that it's considered a proofing error. i.e. it's misspelled as far as Word is concerned. The cause of it is that you've opened the document in Word and have proofing turned on. As such, the text is marked as "isDirty" and split up into different runs.
The two ways about fixing this are:
Client-side. In Word, just make sure all proofing errors are either corrected or ignored.
Format-side. Use the MarkupSimplifier tool that is part of Open XML Package Editor Power Tool for Visual Studio 2010 to fix this outside of the client. Eric White has a great (and timely for you - just a few days old) write up here on it: Getting Started with Open XML PowerTools Markup Simplifier
If you want to search and replace text in a WordprocessingML document, there is a fairly easy algorithm that you can use:
Break all runs into runs of a single character. This includes runs that have special characters such as a line break, carriage return, or hard tab.
It is then pretty easy to find a set of runs that match the characters in your search string.
Once you have identified a set of runs that match, then you can replace that set of runs with a newly created run (which has the run properties of the run containing the first character that matched the search string).
After replacing the single-character runs with a newly created run, you can then consolidate adjacent runs with identical formatting.
I've written a blog post and recorded a screen-cast that walks through this algorithm.
Blog post: http://openxmldeveloper.org/archive/2011/05/12/148357.aspx
Screen cast: http://www.youtube.com/watch?v=w128hJUu3GM
-Eric

Categories