c# copy from text file to word document - c#

I want to copy data from text file to word file. I already have tried it with different alternatives like string array, StringBuilder and StreamReader using Interop which works good, but it takes too much time. It would really be thankful if anyone can suggest me with a better one. Been through many forms on the web, but couldn't find.
FYI: My text file contains more than 1,00,000 lines.
This is one of which I have tried:
string[] lines = File.ReadAllLines(path); //path is text file path
var doc = new MSWord.Document();
foreach (string lin in lines)
{
doc.Content.Text += lin.ToString();
}
doc.Save();
Well, this works good but takes a lot of time and also sometimes throws an error like:
Unhandled Exception: System.Runtime.InteropServices.COMException: Word has encountered a problem.

static void Main(string[] args)
{
Word.Application wordApp = new Word.Application();
Word.Document wordDoc = wordApp.Documents.Add();
Stopwatch sw = Stopwatch.StartNew();
System.Console.WriteLine("Starting");
string path = #"C:\";
StringBuilder stringBuilder = new StringBuilder();
using (FileStream fs = File.Open(path + "\\big.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
wordDoc.Content.Text = sr.ReadToEnd();
wordDoc.SaveAs("big.docx");
}
sw.Stop();
System.Console.WriteLine($"Complete Time :{sw.ElapsedMilliseconds}");
System.Console.ReadKey();
}
Output :
Starting
Complete Time :5556
Or You can use Parallel :
using (StreamReader sr = new StreamReader(bs))
{
Parallel.ForEach(sr.ReadToEnd(), i=>
{
stringBuilder.Append(i);
});
wordDoc.Content.Text = stringBuilder.ToString();
wordDoc.SaveAs(path + "\\big3.docx");
}
Output:
Starting
Complete Time :2587

Microsoft Word can read text files - so why not read the text file into an Interop Word Document & then convert by using one of the SaveAs methods.
I tested with a 34Mb, 1000000 line text file - the result was a 22Mb DOCX file:
MSWord.Application appAC = new MSWord.Application();
MSWord.Document doc = appAC.Documents.Open("TestRead.txt");
doc.SaveAs2(FileName:"TestSave", FileFormat:WdSaveFormat.wdFormatDocumentDefault);
doc.Close();
appAC.Quit();
Note that Microsoft states a maximum document size of 32MB - the text file exceeded this, but the resulting DOCX file was smaller - your exception maybe related to the size of the final file.

Related

c# How can i detect end of page in a Word document with Word Interop

I´m developping with c# Interop Word, and I need to know when a page ends, because I´m creating a Word Table. Rows number isn´t always the same,and if the page doesn´t end, then I have to insert a break page.
How can I do that ?
If you're using Streamreader to read your word file, you can do this:
string path = #"C:\..."; // the path of your word file
Streamreader sr = new Streamreader(path);
while (!sr.EndOfStream) {
// what you want to do
{
sr.Close();
If you want to know at what line your file ends, you just create a count value and when your file ends, you check the value of the count and you have it:
string path = #"C:\..."; // the path of your word file
int count = 0;
Streamreader sr = new Streamreader(path);
while (!sr.EndOfStream) {
// what you want to do
count++;
{
sr.Close();

C# creating Word file - Error opening file

I am creating some word files (and replacing some words) from word template using this code snippet:
File.Copy(sourceFile, destinationFile, true);
try
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(destinationFile, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
foreach (KeyValuePair<string, string> item in keyValues)
{
Regex regexText = new Regex(item.Key);
docText = regexText.Replace(docText, item.Value);
}
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
But when I am trying to open the produced file I get this error:
We're sorry. We can't open XXX.docx because we found a problem with its contents
XML parsing error Location:Part:/word/document.xml, Line:2, Column:33686
Here are the contents of document.xml in the specific place:
.....<w:szCs w:val="20"/><w:lang w:val="en-US"/></w:rPr><w:t> </w:t></w:r><w:proofErr w:type="spellEnd"/>....
Where 33686 is the position of the &nbsp. How I can fix that problem?
EDIT In another file that produced correctly in the same position there are some random characters I used for testing which are used also in the title of the document
It looks like you're using regular expressions to directly modify XML, which is typically going to lead to difficult-to-troubleshoot issues like this, especially if any of your regexes match anything that could be interpreted as XML.
As an alternative, you may want to investigate that WordProcessDocument class more deeply. It looks like it has strongly-typed objects like Paragraph that you can modify more safely.

Replacing Template Fields in Word Documents with OpenXML

I am trying to create a templating system with OpenXML in our Azure app service-based application (so no Interop) and am running into issues with getting it to work. Here is the code I am currently working with (contents is a byte array):
using(MemoryStream stream = new MemoryStream(contents))
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(stream, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("<< Company.Name >>");
docText = regexText.Replace(docText, "Company 123");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
wordDoc.Save();
}
updated = stream.ToArray();
}
The search text is not being found/replaced, which I am assuming is because of the way everything is stored separately in the XML, but how would I go about replacing a field like this?
Thanks
Ryan
With OpenXML SDK you can use this SearchAndReplace - Youtube, note that it's a screen cast that shows the algorithm that can be used to accomplish the replacement over multiple <w:run> elements.
An alternative approach would be to use pure .NET solution, like this Find and Replace text in a Word document.
Last, the easiest and straightforward approach would be to use some other library, for instance check this example of Find and Replace with GemBox.Document.

Inserting a doc file inplace of place holder

I have a word document which contain many pages. One of those pages contain a placeholder instead of other content. so I want to replace that placeholder with another doc file without losing formatting. This doc file which is to be replaced may have many pages. How can I replace that placeholder with this doc file programmatically.. I searched many but could not find any option to insert a doc file replacing a placeholder.. Thank You In Advance.
Or how can we copy the contents of doc to be inserted and then replace the placeholder with copied content
I found a post here.The below code is from that post.
With the library, you can do the following to replace text from a Word document, considering that documentByteArray is your document byte content taken from database:
using (MemoryStream mem = new MemoryStream())
{
mem.Write(documentByteArray, 0, (int)documentByteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
if instead of "Hi Everyone" if we replace it with a binarydata,which is an array of bytes
byte[] binarydata = File.ReadAllBytes(filepaths);
how can we modify the program?
First of all you should get a Nuget package called Novacode.Docx, this is what I have found to be the best Document creator and editor in the last few years.
using Novacode.Docx;
void Main()
{
var doc = DocX.Load(#"c:\temp\existingDoc.docx");
var docToAdd = DocX.Load(#"c:\temp\docToAdd.docx");
doc.InsertDocument(docToAdd, true); //version 1.0.0.22
doc.InsertDocument(docToAdd); //version 1.0.0.19
}
this is the most simple and basic implementation of what it is that youre after but this works.
for anything else take a look at the documentation at
https://docx.codeplex.com/
or
http://cathalscorner.blogspot.co.uk/
this will be the best place to start. I would also recommend that if you do use this one that you use the version 1.0.0.19 as there are some formatting issues in 1.0.0.22

Manipulating Word 2007 Document XML in C#

I am trying to manipulate the XML of a Word 2007 document in C#. I have managed to find and manipulate the node that I want but now I can't seem to figure out how to save it back. Here is what I am trying:
// Open the document from memoryStream
Package pkgFile = Package.Open(memoryStream, FileMode.Open, FileAccess.ReadWrite);
PackageRelationshipCollection pkgrcOfficeDocument = pkgFile.GetRelationshipsByType(strRelRoot);
foreach (PackageRelationship pkgr in pkgrcOfficeDocument)
{
if (pkgr.SourceUri.OriginalString == "/")
{
Uri uriData = new Uri("/word/document.xml", UriKind.Relative);
PackagePart pkgprtData = pkgFile.GetPart(uriData);
XmlDocument doc = new XmlDocument();
doc.Load(pkgprtData.GetStream());
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("w", nsUri);
XmlNodeList nodes = doc.SelectNodes("//w:body/w:p/w:r/w:t", nsManager);
foreach (XmlNode node in nodes)
{
if (node.InnerText == "{{TextToChange}}")
{
node.InnerText = "success";
}
}
if (pkgFile.PartExists(uriData))
{
// Delete template "/customXML/item1.xml" part
pkgFile.DeletePart(uriData);
}
PackagePart newPkgprtData = pkgFile.CreatePart(uriData, "application/xml");
StreamWriter partWrtr = new StreamWriter(newPkgprtData.GetStream(FileMode.Create, FileAccess.Write));
doc.Save(partWrtr);
partWrtr.Close();
}
}
pkgFile.Close();
I get the error 'Memory stream is not expandable'. Any ideas?
I would recommend that you use Open XML SDK instead of hacking the format by yourself.
Using OpenXML SDK 2.0, I do this:
public void SearchAndReplace(Dictionary<string, string> tokens)
{
using (WordprocessingDocument doc = WordprocessingDocument.Open(_filename, true))
ProcessDocument(doc, tokens);
}
private string GetPartAsString(OpenXmlPart part)
{
string text = String.Empty;
using (StreamReader sr = new StreamReader(part.GetStream()))
{
text = sr.ReadToEnd();
}
return text;
}
private void SavePart(OpenXmlPart part, string text)
{
using (StreamWriter sw = new StreamWriter(part.GetStream(FileMode.Create)))
{
sw.Write(text);
}
}
private void ProcessDocument(WordprocessingDocument doc, Dictionary<string, string> tokenDict)
{
ProcessPart(doc.MainDocumentPart, tokenDict);
foreach (var part in doc.MainDocumentPart.HeaderParts)
{
ProcessPart(part, tokenDict);
}
foreach (var part in doc.MainDocumentPart.FooterParts)
{
ProcessPart(part, tokenDict);
}
}
private void ProcessPart(OpenXmlPart part, Dictionary<string, string> tokenDict)
{
string docText = GetPartAsString(part);
foreach (var keyval in tokenDict)
{
Regex expr = new Regex(_starttag + keyval.Key + _endtag);
docText = expr.Replace(docText, keyval.Value);
}
SavePart(part, docText);
}
From this you could write a GetPartAsXmlDocument, do what you want with it, and then stream it back with SavePart(part, xmlString).
Hope this helps!
You should use the OpenXML SDK to work on docx files and not write your own wrapper.
Getting Started with the Open XML SDK 2.0 for Microsoft Office
Introducing the Office (2007) Open XML File Formats
How to: Manipulate Office Open XML Formats Documents
Manipulate Docx with C# without Microsoft Word installed with OpenXML SDK
The problem appears to be doc.Save(partWrtr), which is built using newPkgprtData, which is built using pkgFile, which loads from a memory stream... Because you loaded from a memory stream it's trying to save the document back to that same memory stream. This leads to the error you are seeing.
Instead of saving it to the memory stream try saving it to a new file or to a new memory stream.
The short and simple answer to the issue with getting 'Memory stream is not expandable' is:
Do not open the document from memoryStream.
So in that respect the earlier answer is correct, simply open a file instead.
Opening from MemoryStream editing the document (in my experience) easy lead to 'Memory stream is not expandable'.
I suppose the message appears when one do edits that requires the memory stream to expand.
I have found that I can do some edits but not anything that add to the size.
So, f.ex deleting a custom xml part is ok but adding one and some data is not.
So if you actually need to open a memory stream you must figure out how to open an expandable MemoryStream if you want to add to it.
I have a need for this and hope to find a solution.
Stein-Tore Erdal
PS: just noticed the answer from "Jan 26 '11 at 15:18".
Don't think that is the answer in all situations.
I get the error when trying this:
var ms = new MemoryStream(bytes);
using (WordprocessingDocument wd = WordprocessingDocument.Open(ms, true))
{
...
using (MemoryStream msData = new MemoryStream())
{
xdoc.Save(msData);
msData.Position = 0;
ourCxp.FeedData(msData); // Memory stream is not expandable.

Categories