OpenXML preserving formats on break lines (problems)

OpenXML preserving formats on break lines (problems) - c#

I'm having serious problems with the breaks in a Word document generation.
this is my library funcion I'm using for send text in a BookMark:
public void sentText(string _BkMk, string _text, bool _break, RunProperties _rProp)
{
Text text = new Text(_text) { Space = SpaceProcessingModeValues.Preserve };
Run run = new Run(new RunProperties(_rProp));
run.Append(text);
Run run2 = new Run();
if (_break)
{
run2.Append(new Break());
//CarriageReturn cr = new CarriageReturn();
//run2.Append(cr);
}
foreach (BookmarkStart bookmarkStart in bookmarkMap.Values)
{
if (bookmarkStart.Name.Value == _BkMk)
{
bookmarkStart.InsertBeforeSelf(run);
if (_break)
{
bookmarkStart.InsertBeforeSelf(run2);
}
}
}
in the runProperties cames the font, size, etc...
The biggest problem is when I send diferent strings in the same Bookmark and I need to leave a line space. I send a empty string, or a space like " " and the result is a empty line, but with a diferent font (TimesNewRoman) and size (12). For me is really important to preserve the font size in this empty lines...
Some idea?

If I understand your question correctly and all you want is a blank line then all you have to do is insert a blank paragraph and it should follow the default font that you have setup. This will require you to split up your text across two different paragraphs with two different runs in order to work:
public void sentText(string _BkMk, string _text, bool _break, RunProperties _rProp)
{
Text text = new Text(_text) { Space = SpaceProcessingModeValues.Preserve };
Run run = new Run(new RunProperties(_rProp));
run.Append(text);
Paragraph paragraph1 = new Paragraph();
paragraph1.Append(run);
foreach (BookmarkStart bookmarkStart in bookmarkMap.Values)
{
if (bookmarkStart.Name.Value == _BkMk)
{
bookmarkStart.InsertBeforeSelf(paragraph1);
if (_break)
{
bookmarkStart.InsertBeforeSelf(paragraph1);
bookmarkStart.InsertBeforeSelf(new Paragraph());
}
}
}
}
I would also recommend using paragraphs instead of just runs since Word will create an empty paragraph when you hit the enter key.

Related

Apose.Words ImportNode ignores font formatting when appendingchild

I am currently using Aspose.Words to open a document, pull content between a bookmark start and a bookmark end and then place that content into another document. The issue that I'm having is that when using the ImportNode method is imports onto my document but changes all of the fonts from Calibri to Times New Roman and changes the font size from whatever it was on the original document to 12pt.
The way I'm obtaining the content from the bookmark is by using the Aspose ExtractContent method.
Because I'm having the issue with the ImportNode stripping my font formatting I tried making some adjustments and saving each node to an HTML string using ToString(HtmlSaveOptions). This works mostly but the problem with this is it is stripping out my returns on the word document so none of my text has the appropriate spacing. My returns end up coming in as HTML in the following format
"<p style=\"margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt\"><span style=\"font-family:Calibri; display:none; -aw-import:ignore\"> </span></p>"
When using
DocumentBuilder.InsertHtml("<p style=\"margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt\"><span style=\"font-family:Calibri; display:none; -aw-import:ignore\"> </span></p>");
it does not correctly add the return on the word document.
Here is the code I'm using, please forgive the comments etc... this has been my attempts at correcting this.
public async Task<string> GenerateHtmlString(Document srcDoc, ArrayList nodes)
{
// Create a blank document.
Document dstDoc = new Document();
ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Open"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
// Remove the first paragraph from the empty document.
dstDoc.FirstSection.Body.RemoveAllChildren();
// Create a new Builder for the temporary document that gets generated with the header or footer data.
// This allows us to control font and styles separately from the main document being built.
var newBuilder = new DocumentBuilder(dstDoc);
Aspose.Words.Saving.HtmlSaveOptions htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions();
htmlSaveOptions.ExportImagesAsBase64 = true;
htmlSaveOptions.SaveFormat = SaveFormat.Html;
htmlSaveOptions.ExportFontsAsBase64 = true;
htmlSaveOptions.ExportFontResources = true;
htmlSaveOptions.ExportTextBoxAsSvg = true;
htmlSaveOptions.ExportRoundtripInformation = true;
htmlSaveOptions.Encoding = Encoding.UTF8;
// Obtain all the links from the source document
// This is used later to add hyperlinks to the html
// because by default extracting nodes using Aspose
// does not pull in the links in a usable way.
var srcDocLinks = srcDoc.Range.Fields.GroupBy(x => x.DisplayResult).Select(x => x.First()).Where(x => x.Type == Aspose.Words.Fields.FieldType.FieldHyperlink).Distinct().ToList();
var childNodes = nodes.Cast<Node>().Select(x => x).ToList();
var oldBuilder = new DocumentBuilder(srcDoc);
oldBuilder.MoveToBookmark("Header");
var allchildren = oldBuilder.CurrentParagraph.Runs;
var allChildNodes = childNodes[0].Document.GetChildNodes(NodeType.Any, true);
var headerText = allChildNodes[0].Range.Bookmarks["Header"].BookmarkStart.GetText();
foreach (Node node in nodes)
{
var html = node.ToString(htmlSaveOptions);
try
{
//   is used by aspose because it works in XML
// If we see this character and the text of the node is \r we need to insert a break
if (html.Contains(" ") && node.Range.Text == "\r")
{
newBuilder.InsertHtml(html, false);
// Change the node into an HTML string
/*var htmlString = node.ToString(SaveFormat.Html);
var tempHtmlLinkDoc = new HtmlDocument();
tempHtmlLinkDoc.LoadHtml(htmlString);
// Get all the child nodes of the html document
var allChildNodes = tempHtmlLinkDoc.DocumentNode.SelectNodes("//*");
// Loop over all child nodes so we can make sure we apply the correct font family and size to the break.
foreach (var childNode in allChildNodes)
{
// Get the style attribute from the child node
var childNodeStyles = childNode.GetAttributeValue("style", "").Split(';');
foreach (var childNodeStyle in childNodeStyles)
{
// Apply the font name and size to the new builder on the document.
if (childNodeStyle.ToLower().Contains("font-family"))
{
newBuilder.Font.Name = childNodeStyle.Split(':')[1].Trim();
}
if (childNodeStyle.ToLower().Contains("font-size"))
{
newBuilder.Font.Size = Convert.ToDouble(childNodeStyle.Split(':')[1]
.Replace("pt", "")
.Replace("px", "")
.Replace("em", "")
.Replace("rem", "")
.Replace("%", "")
.Trim());
}
}
}
// Insert the break with the corresponding font size and name.
newBuilder.InsertBreak(BreakType.ParagraphBreak);*/
}
else
{
// Loop through the source document links so the link can be applied to the HTML.
foreach (var srcDocLink in srcDocLinks)
{
if (html.Contains(srcDocLink.DisplayResult))
{
// Now that we know the html string has one of the links in it we need to get the address from the node.
var linkAddress = srcDocLink.Start.NextSibling.GetText().Replace(" HYPERLINK \"", "").Replace("\"", "");
//Convert the node into an HTML String so we can get the correct font color, name, size, and any text decoration.
var htmlString = srcDocLink.Start.NextSibling.ToString(SaveFormat.Html);
var tempHtmlLinkDoc = new HtmlDocument();
tempHtmlLinkDoc.LoadHtml(htmlString);
var linkStyles = tempHtmlLinkDoc.DocumentNode.ChildNodes[0].GetAttributeValue("style", "").Split(';');
var linkStyleHtml = "";
foreach (var linkStyle in linkStyles)
{
if (linkStyle.ToLower().Contains("color"))
{
linkStyleHtml += $"color:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("font-family"))
{
linkStyleHtml += $"font-family:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("font-size"))
{
linkStyleHtml += $"font-size:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("text-decoration"))
{
linkStyleHtml += $"text-decoration:{linkStyle.Split(':')[1].Trim()};";
}
}
if (linkAddress.ToLower().Contains("mailto:"))
{
// Since the link has mailto included don't add the target attribute to the link.
html = new Regex($#"\b{srcDocLink.DisplayResult}\b").Replace(html, $"{srcDocLink.DisplayResult}");
//html = html.Replace(srcDocLink.DisplayResult, $"{srcDocLink.DisplayResult}");
}
else
{
// Since the links is not an email include the target attribute.
html = new Regex($#"\b{srcDocLink.DisplayResult}\b").Replace(html, $"{srcDocLink.DisplayResult}");
//html = html.Replace(srcDocLink.DisplayResult, $"{srcDocLink.DisplayResult}");
}
}
}
// Inseret the HTML String into the temporary document.
newBuilder.InsertHtml(html, false);
}
}
catch (Exception ex)
{
throw;
}
}
// This is just for debugging/troubleshooting purposes and to make sure thigns look correct
string tempDocxPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.docx");
dstDoc.Save(tempDocxPath);
// We generate this HTML file then load it back up and pass the DocumentNode.OuterHtml back to the requesting method.
ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Save"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
string tempHtmlPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.html");
dstDoc.Save(tempHtmlPath, htmlSaveOptions);
var tempHtmlDoc = new HtmlDocument();
tempHtmlDoc.Load(tempHtmlPath);
var htmlText = tempHtmlDoc.DocumentNode.OuterHtml;
// Clean up our mess...
if (File.Exists(tempDocxPath))
{
File.Delete(tempDocxPath);
}
if (File.Exists(tempHtmlPath))
{
File.Delete(tempHtmlPath);
}
// Return the generated HTML string.
return htmlText;
}

Saving each node to HTML and then inserting them into the destination document is not a good idea. Because not all nodes can be properly saved to HTML and some formatting can be lost after Aspose.Words DOM -> HTML -> Aspose.Words DOM roundtrip.
Regarding the original issue, the problem might occur because you are using ImportFormatMode.UseDestinationStyles, in this case styles and default of the destination document are used and font might be changed. If you need to keep the source document formatting, you should use ImportFormatMode.KeepSourceFormatting.
If the problem occurs even with ImportFormatMode.KeepSourceFormatting this must be a bug and you should report this to Aspose.Words staff in the support forum.

Itext7 PdfAction.CreateGoTo() Links not working in final document

I've parsed html into a PDF and created a table of contents from the Header tags. The bookmarks in the document work fine, but clicking on the line in the table of contents doesn't do anything. The cursor doesn't change icons like it does if I put a URL in the link.
I used Itext RUPS to inspect the final PDF and the named destinations are in the final file.
I tried hard coding a couple of the names in just to see what happens, but they also didn't work. Putting in .CreateURL and google.com works fine.
The one thing I'm doing that may or may not be an issue is I'm creating the body document, then creating the table of contents and merging the two documents.
Maybe Bruno can make a cameo on this one.
private static List ProcessOutlineChildren(PdfDocument pdfDocument, List tableOfContents, IEnumerable<PdfOutline> pdfOutlines, IDictionary<String, PdfObject> names = null)
{
List<TabStop> tabStops = new List<TabStop>();
tabStops.Add(new TabStop(580, TabAlignment.RIGHT));
foreach (var o in pdfOutlines)
{
ListItem currentOutlineItem = new ListItem();
Paragraph paragraph = new Paragraph();
paragraph.AddTabStops(tabStops);
paragraph.Add(o.GetTitle());
paragraph.Add(new Tab());
paragraph.Add((pdfDocument.GetPageNumber((PdfDictionary) o.GetDestination().GetDestinationPage(names))).ToString());
paragraph.SetAction(PdfAction.CreateGoTo(o.GetDestination()));
currentOutlineItem.Add(paragraph);
if (o.GetAllChildren().Any())
{
currentOutlineItem.Add(ProcessOutlineChildren(pdfDocument, new List(), o.GetAllChildren(), names));
}
tableOfContents.Add(currentOutlineItem);
}
return tableOfContents;
}
public class CustomOutlineHandler : OutlineHandler
{
//PDF's require a unique name for destinations, this is how the actions/bookmarks jump to a location.
protected override string GenerateUniqueDestinationName(IElementNode element)
{
string destinationName = base.GenerateUniqueDestinationName(element);
if ("p".Equals(element.Name()))
{
destinationName = destinationName.Replace(GetDestinationNamePrefix(), "paragraph-prefix-");
}
return destinationName;
}
}
//From my main method converting things into PDF.
OutlineHandler customOutlineHandler = new CustomOutlineHandler().PutAllTagPriorityMappings(priorityMappings);
customOutlineHandler.SetDestinationNamePrefix("destination-name-");
properties.SetOutlineHandler(customOutlineHandler);

Replace bookmarks content without removing the bookmark

I want to replace the text content of bookmarks without loosing the bookmark.
foreach(Bookmark b in document.Bookmarks)
{
b.Range.Text = "newtext"; // text is set in document but bookmark is gone
}
I tried to set the new Range of the bookmark before the Text setting but I still have the same problem.
I also tried to re-add the bookmark with document.Bookmarks.Add(name, range); but I can't create an instance of range.

I had to readd the bookmarks and save the range temporarily. I also had to add a list of processed items to evade an endless loop.
List<string> bookmarksProcessed = new List<string>();
foreach (Bookmark b in document.Bookmarks)
{
if (!bookmarksProcessed.Contains(b.Name))
{
string text = getTextFromBookmarkName(b.Name);
var newend = b.Range.Start + text.Length;
var name = b.Name;
Range rng = b.Range;
b.Range.Text = text;
rng.End = newend;
document.Bookmarks.Add(name, rng);
bookmarksProcessed.Add(name);
}
}

Looks like you solved your problem, but here is a cleaner way to do it:
using Office = Microsoft.Office.Core;
using Microsoft.Office.Tools.Word;
using System.Text.RegularExpressions;
using Word = Microsoft.Office.Interop.Word;
//declare and get the current document
Document extendedDocument = Globals.Factory.GetVstoObject(Globals.ThisAddIn.Application.ActiveDocument);
List<string> bookmarksProcessed = new List<string>();
foreach(Word.Bookmark oldBookmark in extendedDocument.Bookmarks)
{
if(bookmarksProcessed.Contains(oldBookmark.Name))
{
string newText = getTextFromBookmarkName(oldBookmark.Name)
Word.Range newRange = oldBookmark.Range;
newRange.End = newText.Length;
Word.Bookmark newBookmark = extendedDocument.Controls.AddBookmark(newRange, oldBookmark.Name);
newBookmark.Text = newText;
oldBookmark.Delete();
}
}
Code isn't tested but should work.

With the above approach, you still lose the bookmark before it is added back in. If you really need to preserve the bookmark, I find that you can create an inner bookmark that wraps around the text (a bookmark within bookmark). After having the inner bookmark, you simply need to do:
innerBookmark.Range.Text = newText;
After the text replacing, the inner bookmark is gone and the outer bookmark is preserved. No need to set range.End.
You can create the inner bookmark manually or programmatically depending on your situation.

Different styling in a line in wordprocessing with opxnxml in c#

I have used codes below to apply two different character style to the two runs of one paragraph:
Paragraph heading = new Paragraph();
ParagraphProperties heading_pPr = new ParagraphProperties();
heading.Append(heading_pPr);
Run Run1 = new Run() { RsidRunProperties = "009531B2" };
Text Text_Run1 = new Text("THIS IS TEST RUN 1");
Run1.Append(Text_Run1);
RunProperties rpr_Run1 = new RunProperties();
rpr_Run1.RunStyle = new RunStyle() { Val = "CharacterStyle1" };
Run Run2 = new Run();
RunProperties rpr_Run2 = new RunProperties();
rpr_Run2.RunStyle = new RunStyle() { Val = "CharacterStyle2" };
Text text_Run2 = new Text("THIS IS TEST RUN 2");
Run2.Append(text_Run2);
heading.Append(Run1);
heading.Append(Run2);
body.Append(heading);
But after running the code, In the word file these runs gets the Normal style.
I can apply paragraph style to the paragraph but i can't apply character style to run,Where is wrong in my code?
In Conclusion:
How can i apply character style to a run and how to have a paragraph with different styling Run?

You need to specify the formatting for the paragraph in its properties section otherwise it's going to fallback to the document's default which in this case is Normal. This might also happen if your custom styles are not saved to the styles part of the document.
Change your code to:
Paragraph heading = new Paragraph();
ParagraphProperties heading_pPr = new ParagraphProperties();
heading.Append(heading_pPr);
ParagraphMarkRunProperties headingParagraphMarkRunProperties = new ParagraphMarkRunProperties();
RunStyle runStyle1 = new RunStyle(){ Val = "CharacterStyle1" };
headingParagraphMarkRunProperties.Append(runStyle1);
heading_pPr.Append(headingParagraphMarkRunProperties);
This will enable your paragraph to adopt your custom formatting. You still need to apply individual styles to the run elements to change it's formatting as you correctly did in your rest of the code.

Microsoft Word Document Controls not accepting carriage returns

So, I have a Microsoft Word 2007 Document with several Plain Text Format (I have tried Rich Text Format as well) controls which accept input via XML.
For carriage returns, I had the string being passed through XML containing "\r\n" when I wanted a carriage return, but the word document ignored that and just kept wrapping things on the same line. I also tried replacing the \r\n with System.Environment.NewLine in my C# mapper, but that just put in \r\n anyway, which still didn't work.
Note also that on the control itself I have set it to "Allow Carriage Returns (Multiple Paragrpahs)" in the control properties.
This is the XML for the listMapper
<Field id="32" name="32" fieldType="SimpleText">
<DataSelector path="/Data/DB/DebtProduct">
<InputField fieldType=""
path="/Data/DB/Client/strClientFirm"
link="" type=""/>
<InputField fieldType=""
path="strClientRefDebt"
link="" type=""/>
</DataSelector>
<DataMapper formatString="{0} Account Number: {1}"
name="SimpleListMapper" type="">
<MapperData></MapperData>
</DataMapper>
</Field>
Note that this is the listMapper C# where I actually map the list (notice where I try and append the system.environment.newline)
namespace DocEngine.Core.DataMappers
{
public class CSimpleListMapper:CBaseDataMapper
{
public override void Fill(DocEngine.Core.Interfaces.Document.IControl control, CDataSelector dataSelector)
{
if (control != null && dataSelector != null)
{
ISimpleTextControl textControl = (ISimpleTextControl)control;
IContent content = textControl.CreateContent();
CInputFieldCollection fileds = dataSelector.Read(Context);
StringBuilder builder = new StringBuilder();
if (fileds != null)
{
foreach (List<string> lst in fileds)
{
if (CanMap(lst) == false) continue;
if (builder.Length > 0 && lst[0].Length > 0)
builder.Append(Environment.NewLine);
if (string.IsNullOrEmpty(FormatString))
builder.Append(lst[0]);
else
builder.Append(string.Format(FormatString, lst.ToArray()));
}
content.Value = builder.ToString();
textControl.Content = content;
applyRules(control, null);
}
}
}
}
}
Does anybody have any clue at all how I can get MS Word 2007 (docx) to quit ignoring my newline characters??

Use a function like this:
private static Run InsertFormatRun(Run run, string[] formatText)
{
foreach (string text in formatText)
{
run.AppendChild(new Text(text));
RunProperties runProps = run.AppendChild(new RunProperties());
Break linebreak = new Break();
runProps.AppendChild(linebreak);
}
return run;
}

None of the above answers were any help for me.
However I figured out that the InsertAfter method swaps the \n in the original XML string for \v and when this is passed into the content control it then renders correctly.
contentControl.MultiLine = true
contentControl.Range.InsertAfter(your string)

I got the same problem but it was in a table cell.
I had one string with carriage return (multiple line) into a Text object that was append to a paragraph that was append to a table cell.
=> The carriage return was ignored by word.
Well the solution was simple :
Create one paragraph by line and add all of these paragraph's to the table cell.

I think it works
WordprocessingDocument _docx = WordprocessingDocument.Create("c:\\Test.docx", WordprocessingDocumentType.Document);
MainDocumentPart _part = _docx.MainDocumentPart;
string _str = "abc\ndef\ngeh";
string _strArr[] = _str.Split('\n');
foreach (string _line in _strArr)
{
Body _body = new Body();
_body.Append(NewText(_text));
_part.Append(_body);
}
_part.Document.Save();
_docx.Close();
.
static Paragraph NewText(string _text)
{
Paragraph _head = new Paragraph();
Run _run = new Run();
Text _line = new Text(_text);
_run.Append(_line);
_head.Append(_run);
return _head;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

OpenXML preserving formats on break lines (problems) - c#

Related

Apose.Words ImportNode ignores font formatting when appendingchild

Itext7 PdfAction.CreateGoTo() Links not working in final document

Replace bookmarks content without removing the bookmark

Different styling in a line in wordprocessing with opxnxml in c#

Microsoft Word Document Controls not accepting carriage returns

Categories

Resources