I want to read data - like string, from .docx file from C# code. I look through some of the issues but didn't understand which one to use.
I'm trying to use ApplicationClass Application = new ApplicationClass(); but I get t
Error:
The type 'Microsoft.Office.Interop.Word.ApplicationClass' has no
constructors defined
And I want to get full text from my docx file, NOT SEPARATED WORDS !
foreach (FileInfo f in docFiles)
{
Application wo = new Application();
object nullobj = Missing.Value;
object file = f.FullName;
Document doc = wo.Documents.Open(ref file, .... . . ref nullobj);
doc.Activate();
doc. == ??
}
I want to know how can I get whole text from docx file?
This Is what I want to extract whole text from docx file !
using (ZipFile zip = ZipFile.Read(filename))
{
MemoryStream stream = new MemoryStream();
zip.Extract(#"word/document.xml", stream);
stream.Seek(0, SeekOrigin.Begin);
XmlDocument xmldoc = new XmlDocument();
xmldoc.Load(stream);
string PlainTextContent = xmldoc.DocumentElement.InnerText;
}
try
Word.Application interface instead of ApplicationClass.
Understanding Office Primary Interop Assembly Classes and Interfaces
The .docx format as the other Microsoft Office files that end with "x" is simply a ZIP package that you can open/modify/compress.
So use an Office Open XML library like this.
Enjoy.
Make sure you are using .Net Framework 4.5.
using NUnit.Framework;
[TestFixture]
public class GetDocxInnerTextTestFixture
{
private string _inputFilepath = #"../../TestFixtures/TestFiles/input.docx";
[Test]
public void GetDocxInnerText()
{
string documentText = DocxInnerTextReader.GetDocxInnerText(_inputFilepath);
Assert.IsNotNull(documentText);
Assert.IsTrue(documentText.Length > 0);
}
}
using System.IO;
using System.IO.Compression;
using System.Xml;
public static class DocxInnerTextReader
{
public static string GetDocxInnerText(string docxFilepath)
{
string folder = Path.GetDirectoryName(docxFilepath);
string extractionFolder = folder + "\\extraction";
if (Directory.Exists(extractionFolder))
Directory.Delete(extractionFolder, true);
ZipFile.ExtractToDirectory(docxFilepath, extractionFolder);
string xmlFilepath = extractionFolder + "\\word\\document.xml";
var xmldoc = new XmlDocument();
xmldoc.Load(xmlFilepath);
return xmldoc.DocumentElement.InnerText;
}
}
First you need to add some references from assemblies such as:
System.Xml
System.IO.Compression.FileSystem
Second you should be certain of calling these using in your class:
using System.IO;
using System.IO.Compression;
using System.Xml;
Then you can use below code:
public string DocxToString(string docxPath)
{
// Destination of your extraction directory
string extractDir = Path.GetDirectoryName(docxPath) + "\\" + Path.GetFileName(docxPath) + ".tmp";
// Delete old extraction directory
if (Directory.Exists(extractDir)) Directory.Delete(extractDir, true);
// Extract all of media an xml document in your destination directory
ZipFile.ExtractToDirectory(docxPath, extractDir);
XmlDocument xmldoc = new XmlDocument();
// Load XML file contains all of your document text from the extracted XML file
xmldoc.Load(extractDir + "\\word\\document.xml");
// Delete extraction directory
Directory.Delete(extractDir, true);
// Read all text of your document from the XML
return xmldoc.DocumentElement.InnerText;
}
Enjoy...
Related
I'm trying to edit xml file.
but document.Save() method has to use another file name.
Is there any way to use same file? or other method. Thank you!
string path = "test.xml";
using (FileStream xmlFile = File.OpenRead(path))
{
XDocument document = XDocument.Load(xmlFile);
var setupEl = document.Root;
var groupEl = setupEl.Elements().ElementAt(0);
var valueEl = groupEl.Elements().ElementAt(1);
valueEl.Value = "Test2";
document.Save("test-result.xml");
// document.Save("test.xml"); I want to use this line.
}
I receive the error:
The process cannot access the file '[...]\test.xml' because it is being used by another process.
The problem is that you are trying to write to the file while you still have it open. However, you have no need to have it open once you've loaded the XML file. Simply scoping your code more granularly will solve the issue:
string path = "test.xml";
XDocument document;
using (FileStream xmlFile = File.OpenRead(path))
{
document = XDocument.Load(xmlFile);
}
// the rest of your code
Using C# on vs 2012, I am trying to convert a file of type doc to a file of tybe docx but I get two errors
Error1: 'Application' is an ambiguous reference between 'System.windows.forms.application' and 'Microsoft.office.interop.word.application'
Error2: the type 'system.windows.forms.application' has no constructors found
Using system.IO
Using Microsoft.office.Interop.word
public void ConvertDocToDocx(string path)
{
Application word = new Application();
if (path.ToLower().EndsWith(".doc"))
{
var sourceFile = new FileInfo(path);
var document = word.Documents.Open(sourceFile.FullName);
string newFileName = sourceFile.FullName.Replace(".doc", ".docx");
document.SaveAs2(newFileName,WdSaveFormat.wdFormatXMLDocument,
CompatibilityMode: WdCompatibilityMode.wdWord2010);
word.ActiveDocument.Close();
word.Quit();
File.Delete(path);
}
}
You use both namespaces in that file, you could do:
Use full name with namespace eg. Microsoft.Office.Interop.Word.Application instead Application
Declare alias for class by using eg. using WordApp = Microsoft.Office.Interop.Word.Application;, and than use WordApp instead Application
Remove unused namespace (only if you don't use it)
Example:
Using system.IO
Using Microsoft.office.Interop.word
public void ConvertDocToDocx(string path)
{
var word = new Microsoft.office.Interop.word.Application();
...
}
I'm trying to add a customxmlPart to a docm file without success.
Apparently the file is too big (more than 10mb) to be included in the package.
If the xml file size is less than 7 mb the document can be opened successfully.
Any ideas ?
Thank you for your help.
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open("doc.docm", true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
if (wordDoc.MainDocumentPart.CustomXmlParts != null)
{
wordDoc.MainDocumentPart.DeleteParts<CustomXmlPart>(wordDoc.MainDocumentPart.CustomXmlParts);
}
CustomXmlPart myXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (FileStream stream = new FileStream("10mbfile.xml", FileMode.Open))
{
myXmlPart.FeedData(stream);
}
wordDoc.Package.Flush();
}
EDIT : I found the issue, the xml file contains lot of carriage return + line feed. After removing them i can embed the file as a CustomXmlPart.
The following unit test demonstrates that you can add very large custom XML parts (up to 30MB in the example) to a Word document:
using System;
using System.IO;
using System.Linq;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using Xunit;
namespace CodeSnippets.Tests.OpenXml.Wordprocessing
{
public class LargeCustomXmlPartsTests
{
public static readonly XNamespace W = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
[Theory]
[InlineData(5)]
[InlineData(10)]
[InlineData(15)]
[InlineData(20)]
[InlineData(30)]
public void CanCreateLargeCustomXmlParts(int size)
{
int desiredStreamLength = size * 1024 * 1024;
string path = $"Document_{size:D2}MB.docm";
// Create a macro-enabled Word document with a custom XML part having
// at least the desired size in MB.
CreateMacroEnabledWordDocument(path, size);
// Assert that the document does have a custom XML part with at least
// the desired size.
using WordprocessingDocument wordDocument = WordprocessingDocument.Open(path, false);
CustomXmlPart customXmlPart = wordDocument.MainDocumentPart.CustomXmlParts.First();
using Stream stream = customXmlPart.GetStream(FileMode.Open, FileAccess.Read);
Assert.True(stream.Length > desiredStreamLength);
}
private static void CreateMacroEnabledWordDocument(string path, int size)
{
const WordprocessingDocumentType type = WordprocessingDocumentType.MacroEnabledDocument;
using WordprocessingDocument wordDocument = WordprocessingDocument.Create(path, type);
// Create a main document part with an empty document.
MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart();
WriteRootElement(mainDocumentPart,
new XElement(W + "document",
new XElement(W + "body",
new XElement(W + "p"))));
// Create a custom XML part with the desired size in MB.
CustomXmlPart customXmlPart = mainDocumentPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
WriteRootElement(customXmlPart, CreatePartRootElement(size));
}
private static void WriteRootElement(OpenXmlPart part, XElement partRootElement)
{
using Stream stream = part.GetStream(FileMode.Create, FileAccess.Write);
using XmlWriter writer = XmlWriter.Create(stream);
partRootElement.WriteTo(writer);
}
private static XElement CreatePartRootElement(int size)
{
var random = new Random();
return new XElement("root",
Enumerable.Range(0, size).Select(paragraphIndex =>
new XElement("p",
new XAttribute("i", paragraphIndex),
Enumerable.Range(0, 1000).Select(runIndex =>
new XElement("r",
new XAttribute("i", runIndex),
new XElement("t", CreateRandomString(random)))))));
}
private static string CreateRandomString(Random random)
{
char[] value = Enumerable
.Range(0, 930)
.Select(i => Convert.ToChar(random.Next(33, 125)))
.ToArray();
return new string(value);
}
}
}
On my Windows 10 notebook, Microsoft Word for Office 365 opens the document with the 30MB custom XML part without any problems. Therefore, I'd say, your problem must be caused by other factors or a combination of factors, including any processing of the custom XML part performed by the VSTO add-in that was mentioned in a comment.
I found the issue, the xml file contains lot of carriage return + line feed. After removing them i can embed the file as a CustomXmlPart.
In C# am trying to check to see if an XML file is created, if not create the file and then create the xml declaration, a comment and a parent node.
When I try to load it, it gives me this error:
"The process cannot access the file 'C:\FileMoveResults\Applications.xml' because it is being used by another process."
I checked the task manager to ensure it wasn't open and sure enough there were no open applications of it. Any ideas of what's going on?
Here is the code I am using:
//check for the xml file
if (!File.Exists(GlobalVars.strXMLPath))
{
//create the xml file
File.Create(GlobalVars.strXMLPath);
//create the structure
XmlDocument doc = new XmlDocument();
doc.Load(GlobalVars.strXMLPath);
//create the xml declaration
XmlDeclaration xdec = doc.CreateXmlDeclaration("1.0", null, null);
//create the comment
XmlComment xcom = doc.CreateComment("This file contains all the apps, versions, source and destination paths.");
//create the application parent node
XmlNode newApp = doc.CreateElement("applications");
//save
doc.Save(GlobalVars.strXMLPath);
Here is the code I ended up using to fix this issue:
//check for the xml file
if (!File.Exists(GlobalVars.strXMLPath))
{
using (XmlWriter xWriter = XmlWriter.Create(GlobalVars.strXMLPath))
{
xWriter.WriteStartDocument();
xWriter.WriteComment("This file contains all the apps, versions, source and destination paths.");
xWriter.WriteStartElement("application");
xWriter.WriteFullEndElement();
xWriter.WriteEndDocument();
}
File.Create() returns a FileStream that locks the file until it's closed.
You don't need to call File.Create() at all; doc.Save() will create or overwrite the file.
I would suggest something like this:
string filePath = "C:/myFilePath";
XmlDocument doc = new XmlDocument();
if (System.IO.File.Exists(filePath))
{
doc.Load(filePath);
}
else
{
using (XmlWriter xWriter = XmlWriter.Create(filePath))
{
xWriter.WriteStartDocument();
xWriter.WriteStartElement("Element Name");
xWriter.WriteEndElement();
xWriter.WriteEndDocument();
}
//OR
XmlDeclaration xdec = doc.CreateXmlDeclaration("1.0", null, null);
XmlComment xcom = doc.CreateComment("This file contains all the apps, versions, source and destination paths.");
XmlNode newApp = doc.CreateElement("applications");
XmlNode newApp = doc.CreateElement("applications1");
XmlNode newApp = doc.CreateElement("applications2");
doc.Save(filePath); //save a copy
}
The reason your code is currently having problems is because of: File.Create creates the file and opens the stream to the file, and then you never make use of it (never close it) on this line:
//create the xml file
File.Create(GlobalVars.strXMLPath);
if you did something like
//create the xml file
using(Stream fStream = File.Create(GlobalVars.strXMLPath)) { }
Then you would not get that in use exception.
As a side note XmlDocument.Load will not create a file, only work with an already create one
You could create a stream, setting the FileMode to FileMode.Create and then use the stream to save the Xml to the path specified.
using (System.IO.Stream stream = new System.IO.FileStream(GlobalVars.strXMLPath, FileMode.Create))
{
XmlDocument doc = new XmlDocument();
...
doc.Save(stream);
}
I'm adding in some custom XML to a docx for tracking it inside an application I'm writing.
I've manually done it via opening the Word Document via a ZIP library, and via the official Open XML SDK route. Both have the same outcome of my XML being inserted into customXml folder in the document. The document opens fine in Word for both of these methods, and the XML is present.
BUT when I then save the document as MyDoc2.docx for example all my XML disappears.
What am I doing wrong?
Microsoft links I've been following:
http://msdn.microsoft.com/en-us/library/bb608597.aspx
http://msdn.microsoft.com/en-us/library/bb608612.aspx
And the code I've taken from the Open XML SDK 2.0:
public static void AddNewPart(string document, string fileName)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
CustomXmlPart myXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (FileStream stream = new FileStream(fileName, FileMode.Open))
{
myXmlPart.FeedData(stream);
}
}
}
Thanks,
John
Ok,so I managed to find the following article Using Custom XML Part as DataStore on openxmldeveloper.org, and have stripped out the unnecessary code so that it inserts and retains custom XML:
static void Main(string[] args)
{
using (WordprocessingDocument doc = WordprocessingDocument.Open("Test.docx", true, new OpenSettings()))
{
int customXmlPartsCount = doc.MainDocumentPart.GetPartsCountOfType<CustomXmlPart>();
if (customXmlPartsCount == 0)
{
CustomXmlPart customXmlPersonDataSourcePart = doc.MainDocumentPart.AddNewPart<CustomXmlPart>("application/xml", null);
using (FileStream stream = new FileStream("Test.xml", FileMode.Open))
{
customXmlPersonDataSourcePart.FeedData(stream);
}
CustomXmlPropertiesPart customXmlPersonPropertiesDataSourcePart = customXmlPersonDataSourcePart
.AddNewPart<CustomXmlPropertiesPart>("Rd3c4172d526e4b2384ade4b889302c76");
Ds.DataStoreItem dataStoreItem1 = new Ds.DataStoreItem() { ItemId = "{88e81a45-98c0-4d79-952a-e8203ce59aac}" };
customXmlPersonPropertiesDataSourcePart.DataStoreItem = dataStoreItem1;
}
}
}
So all the examples from Microsoft work as long as you don't modify the file. The problem appears to be because we don't setup the relationship with the Main Document.