Why is wrapping inconsistent in C# Excel Interop? - c#

Using this code:
var avgOrderAmountHeaderCell = (Range)_xlSheetDelPerf.Cells[DEL_PERF_COLUMN_HEADER_ROW, AVG_ORDER_AMOUNT_COLUMN];
avgOrderAmountHeaderCell.Value2 = String.Format("Avg Order{0}Amount", Environment.NewLine);
avgOrderAmountHeaderCell.WrapText = true;
var avgPackageCountHeaderCell = (Range)_xlSheetDelPerf.Cells[DEL_PERF_COLUMN_HEADER_ROW, AVG_PACKAGE_AMOUNT_COLUMN];
avgPackageCountHeaderCell.Value2 = String.Format("Avg Package{0}Count", Environment.NewLine);
avgPackageCountHeaderCell.WrapText = true;
...I get this result (column header on left looks fine, while the one on the right has an "empty line" between the second and third wrapped line):
Why is this seemingly arbitrary inconsistent implementation of text wrapping taking place, and more importantly, how can I force no "blank lines" to appear between one line of text and another?

Related

NPO XWPFDocument doesn't insert a continuous section break

I am trying to create a section which ends with a continuous section break but every time i try to set the type of the break, it still applies a page break. My question: how do I insert a continuous section break?
The code:
CT_SectPr sect = tableParagraph.GetCTP().AddNewPPr().createSectPr();
sect.type = new CT_SectType();
sect.type.valSpecified = true;
sect.type.val = ST_SectionMark.continuous;

Remove text between two occurrences in Word with Interop

I have a piece of text tagged with ##ABC, so it looks like this:
Some text ##ABCtext to be found##ABC some text
I need to find and remove the ##ABCtext to be found##ABC with interop. So far, I've come up wit the following code, which, however, seems to do nothing:
Microsoft.Office.Interop.Word.Range rng = document.Range();
rng.Find.ClearFormatting();
rng.Find.Replacement.ClearFormatting();
rng.Find.MatchWildcards = true;
rng.Find.Text = "##ABC(.*?)##ABC";
rng.Find.Replacement.Text = "";
rng.Find.Forward = true;
rng.Find.Wrap = Microsoft.Office.Interop.Word.WdFindWrap.wdFindStop;
rng.Find.Execute(Replace: Microsoft.Office.Interop.Word.WdReplace.wdReplaceAll);
What am I missing?

C# Stream keeps skipping first line

alright I'm doing something that should be rather simple, I believe I am overlooking something here.
Alright I and using a HttpWebRequest and a WebResponse to detect if a Robots.txt exists on a server (and that works perfectly fine). However, I am trying to add to do myList.Add(reader.ReadLine()); Which (works). But problem is, it keeps skipping the very first line.
https://www.assetstore.unity3d.com/robots.txt < That is the one I started noticing the problem on (just so you know what I'm talking about). It is just for testing purposes. (Look at that link so you can get an idea as to what I'm talking about).
Anywho, it is also not adding the reader.ReadLine to my list either (first line only). So I'm not exactly understanding what's going on, I've tried looking this up and the only things I'm finding is to purposely want to skip a line, I don't want to do that.
My Code Below.
Console.WriteLine("Robots.txt Found: Presenting Rules in (Robot Rules).");
HttpWebRequest getResults = (HttpWebRequest)WebRequest.Create(ur + "robots.txt");
WebResponse getResponse = getResults.GetResponse();
using (StreamReader reader = new StreamReader(getResponse.GetResponseStream())) {
string line = reader.ReadLine();
while(line != null && line != "#") {
line = reader.ReadLine();
rslList.Add(line);
results.Text = results.Text + line + Environment.NewLine; // At first I thought it might have been this (nope).
}
// This didn't work either (figured perhaps maybe it was skipping because I had to many things.
// So I just put into a for loop, - nope still skips first line.
// for(int i = 0; i < rslList.Count; i++) {
// results.Text = results.Text + rslList[i] + Environment.NewLine;
// }
}
// Close the connection sense it is no longer needed.
getResponse.Close();
// Now check for user-rights.
CheckUserRights();
Image of the results.
Change when next you call the read line
var line = reader.ReadLine(); //Read first line
while(line != null && line != "#") { //while line condition satisfied
//perform your desired actions
rslList.Add(line);
results.Text = results.Text + line + Environment.NewLine;
line = reader.ReadLine(); //read the next line
}

When using MergeField FieldCodes in OpenXml SDK in C# why do field codes disappear or fragment?

I have been working successfully with the C# OpenXml SDK (Unofficial Microsoft Package 2.5 from NuGet) for some time now, but have recently noticed that the following line of code returns different results depending on what mood Microsoft Word appears to be in when the file gets saved:
var fields = document.Descendants<FieldCode>();
From what I can tell, when creating the document in the first place (using Word 2013 on Windows 8.1) if you use the Insert->QuickParts->Field and choose MergeField from the Field names left hand pane, and then provide a Field name in the field properties and click OK then the field code is correctly saved in the document as I would expect.
Then when using the aforementioned line of code I will receive a field code count of 1 field. If I subsequently edit this document (and even leave this field well alone) the subsequent saving could mean that this field code no longer is returned in my query.
Another case of the same curiousness is when I see the FieldCode nodes split across multiple items. So rather than seeing say:
" MERGEFIELD Author \\* MERGEFORMAT "
As the node name, I will see:
" MERGEFIELD Aut"
"hor \\* MERGEFORMAT"
Split as two FieldCode node values. I have no idea why this would be the case, but it certainly makes my ability to match nodes that much more exciting. Is this expected behaviour? A known bug? I don't really want to have to crack open the raw xml and edit this document to work until I understand what is going on. Many thanks all.
I came across this very problem myself, and found a solution that exists within OpenXML: a utility class called MarkupSimplifier which is part of the PowerTools for Open XML project. Using this class solved all the problems I was having that you describe.
The full article is located here.
Here are some pertinent exercepts :
Perhaps the most useful simplification that this performs is to merge adjacent runs with identical formatting.
It goes on to say:
Open XML applications, including Word, can arbitrarily split runs as necessary. If you, for instance, add a comment to a document, runs will be split at the location of the start and end of the comment. After MarkupSimplifier removes comments, it can merge runs, resulting in simpler markup.
An example of the utility class in use is:
SimplifyMarkupSettings settings = new SimplifyMarkupSettings
{
RemoveComments = true,
RemoveContentControls = true,
RemoveEndAndFootNotes = true,
RemoveFieldCodes = false,
RemoveLastRenderedPageBreak = true,
RemovePermissions = true,
RemoveProof = true,
RemoveRsidInfo = true,
RemoveSmartTags = true,
RemoveSoftHyphens = true,
ReplaceTabsWithSpaces = true,
};
MarkupSimplifier.SimplifyMarkup(wordDoc, settings);
I have used this many times with Word 2010 documents using VS2015 .Net Framework 4.5.2 and it has made my life much, much easier.
Update:
I have revisited this code and have found it clears upon runs on MERGEFIELDS but not IF FIELDS that reference mergefields e.g.
{if {MERGEFIELD When39} = "Y???" "Y" "N" }
I have no idea why this might be so, and examination of the underlying XML offers no hints.
Word will often split text runs with into multiple text runs for no reason I've ever understood. When searching, comparing, tidying etc. We preprocess the body with method which combines multiple runs into a single text run.
/// <summary>
/// Combines the identical runs.
/// </summary>
/// <param name="body">The body.</param>
public static void CombineIdenticalRuns(W.Body body)
{
List<W.Run> runsToRemove = new List<W.Run>();
foreach (W.Paragraph para in body.Descendants<W.Paragraph>())
{
List<W.Run> runs = para.Elements<W.Run>().ToList();
for (int i = runs.Count - 2; i >= 0; i--)
{
W.Text text1 = runs[i].GetFirstChild<W.Text>();
W.Text text2 = runs[i + 1].GetFirstChild<W.Text>();
if (text1 != null && text2 != null)
{
string rPr1 = "";
string rPr2 = "";
if (runs[i].RunProperties != null) rPr1 = runs[i].RunProperties.OuterXml;
if (runs[i + 1].RunProperties != null) rPr2 = runs[i + 1].RunProperties.OuterXml;
if (rPr1 == rPr2)
{
text1.Text += text2.Text;
runsToRemove.Add(runs[i + 1]);
}
}
}
}
foreach (W.Run run in runsToRemove)
{
run.Remove();
}
}
I tried to simplify the document with Powertools but the result was a corrupted word file. I make this routine for simplify only fieldcodes that has specifics names, works in all parts on the docs (maindocumentpart, headers and footers):
internal static void SimplifyFieldCodes(WordprocessingDocument document)
{
var masks = new string[] { Constants.VAR_MASK, Constants.INP_MASK, Constants.TBL_MASK, Constants.IMG_MASK, Constants.GRF_MASK };
SimplifyFieldCodesInElement(document.MainDocumentPart.RootElement, masks);
foreach (var headerPart in document.MainDocumentPart.HeaderParts)
{
SimplifyFieldCodesInElement(headerPart.Header, masks);
}
foreach (var footerPart in document.MainDocumentPart.FooterParts)
{
SimplifyFieldCodesInElement(footerPart.Footer, masks);
}
}
internal static void SimplifyFieldCodesInElement(OpenXmlElement element, string[] regexpMasks)
{
foreach (var run in element.Descendants<Run>()
.Select(item => (Run)item)
.ToList())
{
var fieldChar = run.Descendants<FieldChar>().FirstOrDefault();
if (fieldChar != null && fieldChar.FieldCharType == FieldCharValues.Begin)
{
string fieldContent = "";
List<Run> runsInFieldCode = new List<Run>();
var currentRun = run.NextSibling();
while ((currentRun is Run) && currentRun.Descendants<FieldCode>().FirstOrDefault() != null)
{
var currentRunFieldCode = currentRun.Descendants<FieldCode>().FirstOrDefault();
fieldContent += currentRunFieldCode.InnerText;
runsInFieldCode.Add((Run)currentRun);
currentRun = currentRun.NextSibling();
}
// If there is more than one Run for the FieldCode, and is one we must change, set the complete text in the first Run and remove the rest
if (runsInFieldCode.Count > 1)
{
// Check fielcode to know it's one that we must simplify (for not to change TOC, PAGEREF, etc.)
bool applyTransform = false;
foreach (string regexpMask in regexpMasks)
{
Regex regex = new Regex(regexpMask);
Match match = regex.Match(fieldContent);
if (match.Success)
{
applyTransform = true;
break;
}
}
if (applyTransform)
{
var currentRunFieldCode = runsInFieldCode[0].Descendants<FieldCode>().FirstOrDefault();
currentRunFieldCode.Text = fieldContent;
runsInFieldCode.RemoveAt(0);
foreach (Run runToRemove in runsInFieldCode)
{
runToRemove.Remove();
}
}
}
}
}
}
Hope this helps!!!

Adding a new line to word using FindAndReplace in C#

Right now I'm exporting some text to a 2010 Word document. I have everything working except new lines. What is the character for a new line? I've tried "\r\n", " ^p ", and "\n", Nothing is working.
I'm using the "FindAndReplace" method to replace strings with strings.
The purpose for the newlines is some required formatting. My coworkers have a 6 line box that the text belongs in. On line 1 in that box I have "" and I'm replacing it with information from a database. If the information exceeds one line, they don't want the box to become 7 lines. So I've figured out how to calculate how many lines the text requires and I re-sized the box to 1 line. So for example if my string requires 2 lines, I want to put 4 blank lines after that.
If this is not possible, I was thinking of putting in that box:
<line1>
<line2>
<line3> and so on...
Then just replace each line individually. Any other thoughts?
Thanks in advance.
You can find each instance of new line with ^13 or (the equivalent) ^l and replace them with as many newlines as you require by concatenating ^13. The "Suchen und Ersetzen" dialog below is German for "Search and Replace". Tested in Word 2010.
Example:
This should work as is using COM automation with c#. An example link if you need one.
Here's proof of concept code:
namespace StackWord
{
using StackWord = Microsoft.Office.Interop.Word;
internal class Program
{
private static void Main(string[] args)
{
var myWord = new StackWord.Application { Visible = true };
var myDoc = myWord.Documents.Add();
var myParagraph = myDoc.Paragraphs.Add();
myParagraph.Range.Text =
"Example test one\rExample two\rExample three\r";
foreach (StackWord.Range range in myWord.ActiveDocument.StoryRanges)
{
range.Find.Text = "\r";
range.Find.Replacement.Text = "\r\r\r\r";
range.Find.Wrap = StackWord.WdFindWrap.wdFindContinue;
object replaceAll = StackWord.WdReplace.wdReplaceAll;
if (range.Find.Execute(Replace: ref replaceAll))
{
Console.WriteLine("Found and replaced.");
}
}
Console.WriteLine("Press any key to close...");
Console.ReadKey();
myWord.Quit();
}
}
}
You can always try using:
Environment.NewLine
You save your file word to word 97 - 2003(*.doc), your "FindAndReplace" method will working :D

Categories