Connecting Microsoft.Office.Interop.Word with DocumentOpenXml

Connecting Microsoft.Office.Interop.Word with DocumentOpenXml - c#

I have to copy selected text from activedocument to new file (at the end of target file). Both (source and target) are docx files. The source file is opened in Word and the user is working with it.
I would like to copy the selection without opening the target file as Microsoft.Office.Interop.Word.Document and copy-paste (for performance reasons).
I don't know how to change the "Selection" in open document to xml understandable by DocumentOpenXml and how to inject this xml into target file.
using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml.Packaging;
using Range = Microsoft.Office.Interop.Word.Range;
public void RangeToNewDocument(string documentPath, Range range)
{
string selectedXML = range.WordOpenXML; //??????????
using (WordprocessingDocument doc = WordprocessingDocument.Open(documentPath, true))
{
Body body = doc.MainDocumentPart.Document.Body;
//body.Append(selectedXML); ??????
doc.SaveAs(documentPath + ".RangeToNewDocumentTest.docx");
}
}
Many example codes are for "How to add something" but there are new objects (as 'Runs' or all 'Paragraphs') but I couldn't find anything about an existing object.
The only link I found is:
https://learn.microsoft.com/en-us/office/open-xml/how-to-copy-the-contents-of-an-open-xml-package-part-to-a-document-part-in-a-dif
but there replace one ThemePart with another and I have no idea how to adjust it for me.

Related

Duplicate Word Document Using OpenXML While Open Original Document

I need to create a same copy of existing word document and open it as another instance while the original first document being opened. The second word document do not save but user may have the option to save it or not.
This need to be done using OpenXML.
I will attached here the current implementation. This implementation is having several issues.
The first document need to close first before use it in WordprocessingDocument using statement.
The second newly created document need to save in local folder.
Code Initiation
var doc = Globals.ThisAddIn.Application.ActiveDocument;
doc.Save();
string fileName = doc.FullName;
doc.Close();
using (WordprocessingDocument document = WordprocessingDocument.Create(fileName, WordprocessingDocumentType.Document))
{
}

Why do you need to use OpenXML ? With Interop you could simply:
Open the existing document
Copy everything within the document range
Create a new document
Paste the other document in the new one
It's done quickly and does the job perfectly

Wrong values trying to get words count from a Microsoft Word document with OpenXML?

have a word document and I want to get word count programmatically using OpenXML sdk,
I managed to get word count but openXML returns wrong values.
note that the test document is mixed languages (Arabic, English) Arabic is RTL language.
if you open the word document using Microsoft word in the UI it gives you the correct number of words
but if you go and get the value stored in the app.xml file for the same document you will get different value.
I tried the code in this link
https://msdn.microsoft.com/en-us/library/office/bb521237(v=office.14).aspx
// To retrieve the properties of a document part.
public static void GetPropertyFromDocument(string document)
{
XmlDocument xmlProperties = new XmlDocument();
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Open(document, false))
{
ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;
xmlProperties.Load(appPart.GetStream());
}
XmlNodeList chars = xmlProperties.GetElementsByTagName("Characters");
MessageBox.Show("Number of characters in the file = " +
chars.Item(0).InnerText, "Character Count");
}
the file I tested contains
word count is 13 but using upper code it gives me 11!

DocIO is a .NET library that can read, write and render Word 2003/2007/2010/2013/2016 files. Using DocIO library of Syncfusion, you can get the correct word count. The whole suite of controls is available for free (commercial applications also) through the community license program if you qualify. The community license is the full product with no limitations or watermarks.
Step 1: Create a console application
Step 2: Add reference to Syncfusion.DocIO.Base, Syncfusion.Compression.Base and Syncfusion.OfficeChart.Base; You can add these reference to your project using NuGet also.
Step 3: Copy & paste the following code snippet.
This code snippet will produce the words count in the Word document as per your requirement.
using Syncfusion.DocIO.DLS;
using Syncfusion.DocIO;
using System.IO;
namespace DocIO_MergeDocument
{
class Program
{
static void Main(string[] args)
{
//Creates a new Word document
WordDocument document = new WordDocument(#"InputDocument.docx");
//Update the words count in the document.
document.UpdateWordCount(false);
//Get the updated words count
int wordCount = document.BuiltinDocumentProperties.WordCount;
//Releases the resources occupied by WordDocument instance
document.Dispose();
}
}
}
For further information about DocIO, please refer our help documentation
Note: I work for Syncfusion

Add text and picture in .docx file

I use a Office Word file (template) and in this file there is repetitive default text and photo that I have to replace it by another photo and text
How can I define specific zone in the template and then find those zones in C# to replace them ?

I think the best way is to find out how to manipulate the word xml structure to include the data you want.
For template filling and altering you can use the XML SDK from Microsoft
You can also follow this manual approach here without using the SDK.
Manual approach. You will add a custom XML Ressource that includes your changes/ressources for the template.
If you don`t need to be that flexible you can use the standard content control / picture content control in Word and replace them afterwards in C# - it depends how flexible you want to be in replacing elements..
You can find a good and complete example of using picture content control here: Picture content control handling

Ok, finally I try this approch ; use a Word file with Content Control and use a XML file to bind data to them
For that I use the following code :
string outFile = #"D:\template_created.docx";
string docPath = #"D:\template.docx";
string xmlPath = #"D:\template.xml";
File.Copy(docPath, outFile);
using (WordprocessingDocument doc = WordprocessingDocument.Open(outFile, true))
{
MainDocumentPart mdp = doc.MainDocumentPart;
if (mdp.CustomXmlParts != null)
{
mdp.DeleteParts<CustomXmlPart>(mdp.CustomXmlParts);
}
CustomXmlPart cxp = mdp.AddCustomXmlPart(CustomXmlPartType.CustomXml);
FileStream fs = null;
try
{
fs = new FileStream(xmlPath, FileMode.Open);
cxp.FeedData(fs);
mdp.Document.Save();
}
finally
{
if (fs != null)
{
fs.Dispose();
}
}
}
When I run the app, it created the custom XML file and append it to my Word file. When I open the Word file, there is no error, but all the Content Control are not filled

My final approach was to use Content Control in my Word document with a unique id. Then I can find those id's with C# and replace the content.

How do you get subject & title from a Word document (without opening it)?

I would like to read the title and subject fields from a Word document, but would rather not have the overhead of firing up Word to do it.
If, in Windows Explorer, I display the title and subject columns, and then navigate to a folder that has Word documents in it, then this information is displayed. What mechanism is being used to do this (aside from Shell extensions) because its fast (but I don't know if you actually need Word installed for this to work), so I'm guessing its not firing up Word and opening each document.
I've found a link to Dsofile.dll, which I presume I could use, but does this work for .doc and .docx files and is it the only way ?

Well... as one might assume that the time of the ".doc" file is passing, here is one way to get the subject and title from a ".docx" file (or ".xlsx" file for that matter).
using System;
using System.IO;
using System.IO.Packaging; // Assembly WindowsBase.dll
namespace ConsoleApplication16
{
class Program
{
static void Main(string[] args)
{
String path = Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData);
String file = Path.Combine(path, "Doc1.docx");
Package docx = Package.Open(file, FileMode.Open, FileAccess.Read);
String subject = docx.PackageProperties.Subject;
String title = docx.PackageProperties.Title;
docx.Close();
}
}
}
I hope this is useful to someone.

You can read it via XML, too: How to extract information from Office files by using Office file formats and schemas
Here is another example on how to read a Word doc programmatically.
One way or the other you'll have to look inside the file at some point!

YASR - Yet another search and replace question

Environment: asp.net c# openxml
Ok, so I've been reading a ton of snippets and trying to recreate the wheel, but I'm hoping that somone can help me get to my desination faster. I have multiple documents that I need to merge together... check... I'm able to do that with openxml sdk. Birds are singing, sun is shining so far. Now that I have the document the way I want it, I need to search and replace text and/or content controls.
I've tried using my own text - {replace this} but when I look at the xml (rename docx to zip and view the file), the { is nowhere near the text. So I either need to know how to protect that within the doucment so they don't diverge or I need to find another way to search and replace.
I'm able to search/replace if it is an xml file, but then I'm back to not being able to combine the doucments easily.
Code below... and as I mentioned... document merge works fine... just need to replace stuff.
* Update * changed my replace call to go after the tag instead of regex. I have the right info now, but the .Replace call doesn't seem to want to work. Last four lines are for validation that I was seeing the right tag contents. I simply want to replace those contents now.
protected void exeProcessTheDoc(object sender, EventArgs e)
{
string doc1 = Server.MapPath("~/Templates/doc1.docx");
string doc2 = Server.MapPath("~/Templates/doc2.docx");
string final_doc = Server.MapPath("~/Templates/extFinal.docx");
File.Delete(final_doc);
File.Copy(doc1, final_doc);
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(final_doc, true))
{
string altChunkId = "AltChunkId2";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(doc2, FileMode.Open))
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
exeSearchReplace(final_doc);
}
public static void GetPropertyFromDocument(string document, string outdoc)
{
XmlDocument xmlProperties = new XmlDocument();
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, false))
{
ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;
xmlProperties.Load(appPart.GetStream());
}
XmlNodeList chars = xmlProperties.GetElementsByTagName("Company");
chars.Item(0).InnerText.Replace("{ClientName}", "Penn Inc.");
StreamWriter sw;
sw = File.CreateText(outdoc);
sw.WriteLine(chars.Item(0).InnerText);
sw.Close();
}
}
}

If I'm reading this right, you have something like "{replace me}" in a .docx and then when you loop through the XML, you're finding things like <t>{replace</t><t> me</><t>}</t> or some such havoc. Now, with XML like that, it's impossible to create a routine that will replace "{replace me}".
If that's the case, then it's very, very likely related to the fact that it's considered a proofing error. i.e. it's misspelled as far as Word is concerned. The cause of it is that you've opened the document in Word and have proofing turned on. As such, the text is marked as "isDirty" and split up into different runs.
The two ways about fixing this are:
Client-side. In Word, just make sure all proofing errors are either corrected or ignored.
Format-side. Use the MarkupSimplifier tool that is part of Open XML Package Editor Power Tool for Visual Studio 2010 to fix this outside of the client. Eric White has a great (and timely for you - just a few days old) write up here on it: Getting Started with Open XML PowerTools Markup Simplifier

If you want to search and replace text in a WordprocessingML document, there is a fairly easy algorithm that you can use:
Break all runs into runs of a single character. This includes runs that have special characters such as a line break, carriage return, or hard tab.
It is then pretty easy to find a set of runs that match the characters in your search string.
Once you have identified a set of runs that match, then you can replace that set of runs with a newly created run (which has the run properties of the run containing the first character that matched the search string).
After replacing the single-character runs with a newly created run, you can then consolidate adjacent runs with identical formatting.
I've written a blog post and recorded a screen-cast that walks through this algorithm.
Blog post: http://openxmldeveloper.org/archive/2011/05/12/148357.aspx
Screen cast: http://www.youtube.com/watch?v=w128hJUu3GM
-Eric

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Connecting Microsoft.Office.Interop.Word with DocumentOpenXml - c#

Related

Duplicate Word Document Using OpenXML While Open Original Document

Wrong values trying to get words count from a Microsoft Word document with OpenXML?

Add text and picture in .docx file

How do you get subject & title from a Word document (without opening it)?

YASR - Yet another search and replace question

Categories

Resources