"\r\n" appears as small square boxes in word document, C# - c#

I am appending some text containing '\r\n' into a word document at run-time.
But when I see the word document, they are replaced with small square boxes :-(
I tried replacing them with System.Environment.NewLine but still I see these small boxes.
Any idea?

the answer is to use \v - it's a paragraph break.

Have you not tried one or the other in isolation i.e.\r or \n as Word will interpret a carriage return and line feed respectively. The only time you would use the Environment.Newline is in a pure ASCII text file. Word would handle those characters differently! Or even a Ctrl+M sequence. Try that and if it does not work, please post the code.

Word uses the <w:br/> XML element for line breaks.

After much trial and error, here is a function that sets the text for a Word XML node, and takes care of multiple lines:
//Sets the text for a Word XML <w:t> node
//If the text is multi-line, it replaces the single <w:t> node for multiple nodes
//Resulting in multiple Word XML lines
private static void SetWordXmlNodeText(XmlDocument xmlDocument, XmlNode node, string newText)
{
//Is the text a single line or multiple lines?>
if (newText.Contains(System.Environment.NewLine))
{
//The new text is a multi-line string, split it to individual lines
var lines = newText.Split("\n\r".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
//And add XML nodes for each line so that Word XML will accept the new lines
var xmlBuilder = new StringBuilder();
for (int count = 0; count < lines.Length; count++)
{
//Ensure the "w" prefix is set correctly, otherwise docFrag.InnerXml will fail with exception
xmlBuilder.Append("<w:t xmlns:w=\"http://schemas.microsoft.com/office/word/2003/wordml\">");
xmlBuilder.Append(lines[count]);
xmlBuilder.Append("</w:t>");
//Not the last line? add line break
if (count != lines.Length - 1)
{
xmlBuilder.Append("<w:br xmlns:w=\"http://schemas.microsoft.com/office/word/2003/wordml\" />");
}
}
//Create the XML fragment with the new multiline structure
var docFrag = xmlDocument.CreateDocumentFragment();
docFrag.InnerXml = xmlBuilder.ToString();
node.ParentNode.AppendChild(docFrag);
//Remove the single line child node that was originally holding the single line text, only required if there was a node there to start with
node.ParentNode.RemoveChild(node);
}
else
{
//Text is not multi-line, let the existing node have the text
node.InnerText = newText;
}
}

Related

Finding list of objects that contain full or just part of searched string

I've a list of paragraphs. Each paragagraph can contain Text. I'm trying to search for a string that may be as whole within a single paragraph, or spread across multiple paragraphs with as bad case where each letter is different paragraph.
public List<WordParagraph> FindText(string text) {
List<WordParagraph> list = new List<WordParagraph>();
var found = false;
Paragraph currentParagraph = null;
foreach (var paragraph in this.Paragraphs) {
//if (currentParagraph == null) {
// currentParagraph = paragraph._paragraph;
//} else {
// if (currentParagraph != paragraph._paragraph) {
// found = false;
// }
//}
// paragraph.Text
// logic missing to find text that can start within some paragraph.Text, but
// can span across multiple paragraphs
// for example searching for text "This Is MyTest" within 4 paragraphs that
// may be written like
// paragraph.Text = "Thi"
// paragraph.Text = "s Is"
// paragraph.Text = " MyTes"
// paragraph.Text = "t"
}
return list;
}
I've tried some logic around foreach char in text, and nested loop over text from the paragraph.text but the logic was failing me.
To give you a bit of background. Consider a Word Document that has a single sentence - one long sentence but each word, or even letter is formatted differently - different font size, bold, underline or whatever. It looks like this:
Now what Word actually saved in the file is a single paragraph, but each paragraph has multiple "runs". The run contains a Text element. Each text element contains the text that you see in Word, but due to formatting of possibly even each word it can be split into many many small Text properties.
Now in my example, I've simplified the logic and for me, each "run" is a paragraph with a text. So List of WordParagraphs is a list of runs within Screenshot you see.
Now I need to find a string "I have that" from the whole sentence you see in word. That means I need to go thru all paragraphs, find the first letter that matches and then check if next letter matches as well, if not I need to start again.
My brain is having hard time to grasp this logic in code.

about special character handling

I have written a program there is a point where I convert' /n' to '' tags like below( that mean if the user gives input as a sentence into two lines this will convert into '' between two lines as when it is print HTML writer will recognize this and print that line into two lines)
if (!String.IsNullOrEmpty(dataValue))
{
if (dataValue.Contains("\r\n"))
{
dataValue = dataValue.Replace("\r\n", "<br/>");
}
if (dataValue.Contains("\n"))
{
dataValue = dataValue.Replace("\n", "<br/>");
}
}
table.Add("<%" + data.TagName + "%>", dataValue);
after few days, according to requirement, I needed to handle special characters because when we are printing this into word .doc using HTML writer if there is like '<,>' or any HTML tags, this is stopping the writing doc from that point as it is considering as HTML tag.so as a solution I came up like this
string character = (string)table[tag];
character = WebUtility.HtmlEncode(character);
subject = subject.Replace(tag, character);
body = body.Replace(tag, character);
where I'm going to encode the HTML tags then it continues printing, but the result of this case the part 1 wrong gone, as an example in part one its convert line break to '' and when it's print it prints as a line break for the paragraph but now because of the new solution it encodes and it prints as '' tag ..how to come with a solution here.

Read the Word File, identify headings,& get the content

I want to compare the heading of a word file with string, if it matches , then it displays it content
suppose a word file content 2-4 Paragraphs with heading, I want that it compare the heading with the string & display the content using C#
Not too sure how far you have gotten on the project, but this is how you can compare the heading to your selected string then display the header content.
char[] separ = new char[]{' '};
string[] yourSelectedHeaderText = YourString.Split(separ.StringSplitOptions.RemoveEmptyString)
string[] docHeader = HeaderSting.Split(separ,StringSplitOptions.RemoveEmptyString);
for(int i=0; i< docHeader.Length;i++){
if(docHeader[i] == yourSelectHeaderText[i]){
Console.WriteLine(docHeader[i].ToString());
}
}
Setting everything up into arrays or some kind of collection to iterate through is what you would want to do first, then I iterated through the header Strings one by one, under that iteration I added an if statement that would catch the matching header with your selected string. Inside the if statement we have the line that will display the string in your console.

How can I replace line breaks with nothing/an empty string using the DocX Library?

I need to retain paragraph breaks in a .docx file, but get rid of linebreaks which are often in the wrong place when copying from one file to another (due to different page sizes, and when the font is changed).
Using the DocX Library, I'm trying this:
private void ReplaceLineBreaksWithBoo(string filename)
{
List<string> lineBreaks;
using (DocX document = DocX.Load(filename))
{
lineBreaks = document.FindUniqueByPattern("\n", System.Text.RegularExpressions.RegexOptions.None);
if (lineBreaks.Count > 0)
{
foreach (string s in lineBreaks)
{
document.ReplaceText(s, string.empty); // <-- or a space?
}
}
document.Save();
}
}
...but it doesn't work - "\n" is not the right thing to pass, I reckon; I don't know what I need for that first arg to the FindUniqueByPattern() method. Documentation is nil and the discussion forum there resembles Bodie, California:
I guess you can't do it using FindUniqueByPattern or FindAll. Newline is not represented by any symbol but stored as a paragraph with empty text. You can peek document representation in xml format from document.Xml property, there you'll see empty line stored as single <w:p> element.
Therefore you can search for Paragraphs with empty text instead of searching for newline character :
using (DocX document = DocX.Load(filename))
{
var emptyLines = document.Paragraphs.Where(o => string.IsNullOrEmpty(o.Text));
foreach (var paragraph in emptyLines)
{
paragraph.Remove(false);
}
document.Save();
}

XmlException when loading an XML file with certain characters

I need to use the XmlDocument class to load an XML file:
var doc = new XmlDocument();
doc.Load(filename);
Unfortunately I get an XmlException when in my XML there are specifc characters that I use to rappresent my data, in particular I have a node like the following:
<rect data="string with invalid characters: † ¶"/>
So, the forbidden characters are: † and ¶.
How can I load the file without exceptions and leaving these characters in my XML file?
You'll need to replace those characters with a numerical character reference. Similar to how you replace > and < with & gt; and & lt;, you would replace those characters with something like & #931; or whatever references those specific characters.
edit: I had to add a space after the & to avoid the editor actually picking up and interpreting the character. Just remove the space in use - you get the idea.
Alternatively, if you have no control over the source of the XML and just need to read all of the values in to a database or something, you could use an XmlTextReader to read through the xml line by line, stop on the element you know may contain bad data, and read the chars of that element. I've had to do this in the past. Something like this
static void Main(string[] args)
{
var xtr = new XmlTextReader("");
xtr.Normalization = false;
while (xtr.Read())
{
if(xtr.IsStartElement("Row")) // My xml doc contains many row elements
{
var fields = new string[6];
while(xtr.Read())
{
for (int i = 0; i < 6; i++) // I know my xml only has six child elements per row
{
while(!xtr.IsStartElement())
{
xtr.Read(); // We're not interested in hitting the end elements
}
if(i == 1) // I know my special characters are in the second child element of my row
{
var charBuff = new char[255];
xtr.ReadChars(charBuff, 0, 255); // I know there will be a maximum of 255 characters
fields[i] = new string(charBuff);
}
else
{
fields[i] = xtr.ReadElementContentAsString();
}
}
}
}
}
}

Categories