streamReader.ReadToEnd() return just header OpenXML - c#

Please, I want to find a word and replace it with another word in word doccument using openXML
  I use this method
public static void AddTextToWord(string filepath, string txtToFind,string ReplaceTxt)
{
WordprocessingDocument wordDoc = WordprocessingDocument.Open(filepath, true);
string docText = null;
StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream());
docText = sr.ReadToEnd();
System.Diagnostics.Debug.WriteLine(docText);
Regex regexText = new Regex(txt);
docText = regexText.Replace(docText,txt2);
StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create));
System.Diagnostics.Debug.WriteLine(docText);
wordDoc.Close();
}
but
docText
return just the head of the page that the xml shema of the document.
<?xml version="1.0" encoding=.......

Check Your Strings
If you want to replace a specific word or phrase within your existing content, you may just want to use the String.Replace() method as opposed to performing a Regex.Replace() which may not work as expected (as it expects a regular expression as opposed to a traditional string). This may not matter if you expect to use regular expressions, but it's worth noting.
Ensure You Are Pulling The Content
Word Documents are obviously not as easy to parse as plain text, so in order to get the actual "content", you may have to use an approach similar to the one mentioned here that targets the Document.Body properties instead of reading using a StreamReader object :
docText = wordDoc.MainDocumentPart.Document.Body.InnerText;
Performing Your Replacement
With that said, you currently appear to be reading the contents of your file and storing it in a string called docText. Since you have that string and know your values to find and replace, just call the Replace() method as seen below :
docText = docText.Replace(txtToFind,ReplaceTxt);
Writing Out Your Content
After performing the replacement, you'll just need to write your updated text out to a stream :
using (var sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}

Related

C# creating Word file - Error opening file

I am creating some word files (and replacing some words) from word template using this code snippet:
File.Copy(sourceFile, destinationFile, true);
try
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(destinationFile, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
foreach (KeyValuePair<string, string> item in keyValues)
{
Regex regexText = new Regex(item.Key);
docText = regexText.Replace(docText, item.Value);
}
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
But when I am trying to open the produced file I get this error:
We're sorry. We can't open XXX.docx because we found a problem with its contents
XML parsing error Location:Part:/word/document.xml, Line:2, Column:33686
Here are the contents of document.xml in the specific place:
.....<w:szCs w:val="20"/><w:lang w:val="en-US"/></w:rPr><w:t> </w:t></w:r><w:proofErr w:type="spellEnd"/>....
Where 33686 is the position of the &nbsp. How I can fix that problem?
EDIT In another file that produced correctly in the same position there are some random characters I used for testing which are used also in the title of the document
It looks like you're using regular expressions to directly modify XML, which is typically going to lead to difficult-to-troubleshoot issues like this, especially if any of your regexes match anything that could be interpreted as XML.
As an alternative, you may want to investigate that WordProcessDocument class more deeply. It looks like it has strongly-typed objects like Paragraph that you can modify more safely.

Replacing Template Fields in Word Documents with OpenXML

I am trying to create a templating system with OpenXML in our Azure app service-based application (so no Interop) and am running into issues with getting it to work. Here is the code I am currently working with (contents is a byte array):
using(MemoryStream stream = new MemoryStream(contents))
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(stream, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("<< Company.Name >>");
docText = regexText.Replace(docText, "Company 123");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
wordDoc.Save();
}
updated = stream.ToArray();
}
The search text is not being found/replaced, which I am assuming is because of the way everything is stored separately in the XML, but how would I go about replacing a field like this?
Thanks
Ryan
With OpenXML SDK you can use this SearchAndReplace - Youtube, note that it's a screen cast that shows the algorithm that can be used to accomplish the replacement over multiple <w:run> elements.
An alternative approach would be to use pure .NET solution, like this Find and Replace text in a Word document.
Last, the easiest and straightforward approach would be to use some other library, for instance check this example of Find and Replace with GemBox.Document.

Special character not reading in .txt file

I am using stream reader for reading a text file.
This is the contents of the .txt file:
</a> Schools's are a suitable public </a>
When I read that text I got:
<a>Schoolss are a suitable public<a>
As you can see I did't receive the quotation. How can I receive the special character in a stream reader?
I used following code:
using (StreamReader reader = new StreamReader(CommonGetSet.FileName, System.Text.Encoding.ASCII))
{
string text = reader.ReadToEnd();
docKeyword = XDocument.Parse(text);
}
The problem you are having is that you are trying to load a text file with an xml reader, i.e. this part:
XDocument.Load(reader);
If you look at this question: What characters do I need to escape in XML documents?, you will see other characters that will be stripped/need escaping too.
If you inspect the StreamReader in the debugger you will see it shows the correct text, something that the answer by #JinsPeter shows. So you need to read in a text file, the easiest way is to use either File.ReadAllText or File.ReadAllLines depending on whether you want the result as a string or string[] respectively:
string contents = File.ReadAllText(path);
string[] lines = File.ReadAllLines(path);
However, if for some reason you really want to use a StreamReader you can read directly from the stream using ReadToEnd, ReadLine or any other appropriate read method:
using (StreamReader reader = new StreamReader(path))
{
string contents = reader.ReadToEnd();
}
However, note that the StreamReader methods will read from the current position in the stream so you may need to set the position yourself.
For a list of other ways to read in a file in C# see this question: How to read an entire file to a string using C#?.
When I printed the same text inside the StreamReader I got the ' .
So the issue is with writing it to XML or HTML. Try to fix that rather than finding issue in StreamReader.
using (StreamReader inputStream = new StreamReader(filepath, System.Text.Encoding.UTF8))
{
string line = inputStream.ReadToEnd();
Console.WriteLine(line);
}

Inserting a doc file inplace of place holder

I have a word document which contain many pages. One of those pages contain a placeholder instead of other content. so I want to replace that placeholder with another doc file without losing formatting. This doc file which is to be replaced may have many pages. How can I replace that placeholder with this doc file programmatically.. I searched many but could not find any option to insert a doc file replacing a placeholder.. Thank You In Advance.
Or how can we copy the contents of doc to be inserted and then replace the placeholder with copied content
I found a post here.The below code is from that post.
With the library, you can do the following to replace text from a Word document, considering that documentByteArray is your document byte content taken from database:
using (MemoryStream mem = new MemoryStream())
{
mem.Write(documentByteArray, 0, (int)documentByteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
if instead of "Hi Everyone" if we replace it with a binarydata,which is an array of bytes
byte[] binarydata = File.ReadAllBytes(filepaths);
how can we modify the program?
First of all you should get a Nuget package called Novacode.Docx, this is what I have found to be the best Document creator and editor in the last few years.
using Novacode.Docx;
void Main()
{
var doc = DocX.Load(#"c:\temp\existingDoc.docx");
var docToAdd = DocX.Load(#"c:\temp\docToAdd.docx");
doc.InsertDocument(docToAdd, true); //version 1.0.0.22
doc.InsertDocument(docToAdd); //version 1.0.0.19
}
this is the most simple and basic implementation of what it is that youre after but this works.
for anything else take a look at the documentation at
https://docx.codeplex.com/
or
http://cathalscorner.blogspot.co.uk/
this will be the best place to start. I would also recommend that if you do use this one that you use the version 1.0.0.19 as there are some formatting issues in 1.0.0.22

Convert XML to Plain Text

My goal is to build an engine that takes the latest HL7 3.0 CDA documents and make them backward compatible with HL7 2.5 which is a radically different beast.
The CDA document is an XML file which when paired with its matching XSL file renders a HTML document fit for display to the end user.
In HL7 2.5 I need to get the rendered text, devoid of any markup, and fold it into a text stream (or similar) that I can write out in 80 character lines to populate the HL7 2.5 message.
So far, I'm taking an approach of using XslCompiledTransform to transform my XML document using XSLT and product a resultant HTML document.
My next step is to take that document (or perhaps at a step before this) and render the HTML as text. I have searched for a while, but can't figure out how to accomplish this. I'm hoping its something easy that I'm just overlooking, or just can't find the magical search terms. Can anyone offer some help?
FWIW, I've read the 5 or 10 other questions in SO which embrace or admonish using RegEx for this, and don't think that I want to go down that road. I need the rendered text.
using System;
using System.IO;
using System.Xml;
using System.Xml.Xsl;
using System.Xml.XPath;
public class TransformXML
{
public static void Main(string[] args)
{
try
{
string sourceDoc = "C:\\CDA_Doc.xml";
string resultDoc = "C:\\Result.html";
string xsltDoc = "C:\\CDA.xsl";
XPathDocument myXPathDocument = new XPathDocument(sourceDoc);
XslCompiledTransform myXslTransform = new XslCompiledTransform();
XmlTextWriter writer = new XmlTextWriter(resultDoc, null);
myXslTransform.Load(xsltDoc);
myXslTransform.Transform(myXPathDocument, null, writer);
writer.Close();
StreamReader stream = new StreamReader (resultDoc);
}
catch (Exception e)
{
Console.WriteLine ("Exception: {0}", e.ToString());
}
}
}
Since you have the XML source, consider writing an XSL that will give you the output you want without the intermediate HTML step. It would be far more reliable than trying to transform the HTML.
This will leave you with just the text:
class Program
{
static void Main(string[] args)
{
var blah = new System.IO.StringReader(sourceDoc);
var reader = System.Xml.XmlReader.Create(blah);
StringBuilder result = new StringBuilder();
while (reader.Read())
{
result.Append( reader.Value);
}
Console.WriteLine(result);
}
static string sourceDoc = "<html><body><p>this is a paragraph</p><p>another paragraph</p></body></html>";
}
Or you can use a regular expression:
public static string StripHtml(String htmlText)
{
// replace all tags with spaces...
htmlText = Regex.Replace(htmlText, #"<(.|\n)*?>", " ");
// .. then eliminate all double spaces
while (htmlText.Contains(" "))
{
htmlText = htmlText.Replace(" ", " ");
}
// clear out non-breaking spaces and & character code
htmlText = htmlText.Replace(" ", " ");
htmlText = htmlText.Replace("&", "&");
return htmlText;
}
Can you use something like this which uses lynx and perl to render the html and then convert that to plain text?
This is a great use-case for XSL:FO and FOP. FOP isn't just for PDF output, one of the other major outputs that is supported is text. You should be able to construct a simple xslt + fo stylesheet that has the specifications (i.e. line width) that you want.
This solution will is a bit more heavy-weight that just using xml->xslt->text as ScottSEA suggested, but if you have any more complex formatting requirements (e.g. indenting), it will become much easier to express in fo, than mocking up in xslt.
I would avoid regexs for extracting the text. That's too low-level and guaranteed to be brittle. If you just want text and 80 character lines, the default xslt template will only print element text. Once you have only the text, you can apply whatever text processing is necessary.
Incidentally, I work for a company who produces CDAs as part of our product (voice recognition for dications). I would look into an XSLT that transforms the 3.0 directly into 2.5. Depending on the fidelity you want to keep between the two versions, the full XSLT route will probably be your easiest bet if what you really want to achieve is conversion between the formats. That's what XSLT was built to do.

Categories