Migradoc & nested paragraphs - c#

I am using Ben Foster's Migradoc extensions to format a PDF document using Markdown syntax.
I am running into an issue when using headers or sub-lists (<hx> or <li> elements) within a list (a null reference exception is thrown). The issue is detailed here.
The root cause of the problem is that Migradoc does not support nested paragraphs.
Are there any possible workarounds to this issue?

You ask "Are there any possible workarounds to this issue?"
MigraDoc is able to create PDF and RTF. Does RTF (Word) support nested paragraphs?
Probably not. I think this is not a MigraDoc issue.
Nested lists are possible in MigraDoc, but may require changes in the extensions. IIRC there are limitations with respect to nesting when numbered lists are involved.
IMHO nested paragraphs do not make sense. MigraDoc supports AddFormattedText that allows to use different formats in a single paragraph. This may require changes to the extensions and/or the input given to the extensions.

Hey I've been using Ben Foster's Migradoc extensions as well and had this same problem. This may not be perfect, but it worked well enough for me... Modify your HtmlConverter.cs and do the following:
First, add a global variable:
private int _nestedListLevel;
Next, add 2 new node handlers to the AddDefaultNodeHandlers() method:
nodeHandlers.Add("ul", (node, parent) =>
{
if (parent is Paragraph)
{
_nestedListLevel++;
return parent.Section;
}
_nestedListLevel = 0;
return parent;
});
nodeHandlers.Add("ol", (node, parent) =>
{
if (parent is Paragraph)
{
_nestedListLevel++;
return parent.Section;
}
_nestedListLevel = 0;
return parent;
});
Finally, change the "li" node handler to the following... NOTE, this removes some of the styling work that he did, but it made things less complicated for me and works just fine.. you can re-add that stuff if you want.
nodeHandlers.Add("li", (node, parent) =>
{
var listStyle = node.ParentNode.Name == "ul"
? "UnorderedList"
: "OrderedList";
var section = (Section)parent;
var isFirst = node.ParentNode.Elements("li").First() == node;
var isLast = node.ParentNode.Elements("li").Last() == node;
var listItem = section.AddParagraph().SetStyle(listStyle);
if (listStyle == "UnorderedList")
{
listItem.Format.ListInfo.ListType = _nestedListLevel%2 == 1 ? ListType.BulletList2 : ListType.BulletList1;
}
else
{
listItem.Format.ListInfo.ListType = _nestedListLevel % 2 == 1 ? ListType.NumberList2 : ListType.NumberList1;
}
if (_nestedListLevel > 0)
{
listItem.Format.LeftIndent = String.Format(CultureInfo.InvariantCulture, "{0}in", _nestedListLevel*.75);
}
// disable continuation if this is the first list item
listItem.Format.ListInfo.ContinuePreviousList = !isFirst;
if (isLast)
_nestedListLevel--;
return listItem;
});

Related

How to get list of fonts used within one PDF file and copy them to another?

I'm trying to convert a PDF1.7 document to a PDFA/3B one, and currently I need to get all fonts in the source document and copy them into the target (if this actually the way to do it). So currently I have the following:
for (int i = 1; i < source.GetNumberOfPdfObjects(); i++)
{
var obj = source.GetPdfObject(i);
if (!obj?.IsDictionary() ?? true)
continue;
var dict = obj as PdfDictionary;
if (dict == null)
continue;
if (PdfName.Font.Equals(dict.GetAsName(PdfName.Type)))
{
var fontDescriptor = dict.GetAsDictionary(PdfName.FontDescriptor);
if (fontDescriptor == null)
continue;
//What else?
}
}
But I got stuck trying to get the font.
Is this the way to get the fonts from one doc or is there an easier way? And how does one copy them into the new doc?
To get all the fonts in the document and copy them into the target document, you need the following code:
for (int i = 1; i <= pdfDocument.getNumberOfPdfObjects(); i++) {
PdfObject object = pdfDocument.getPdfObject(i);
if (object.isDictionary() && PdfName.Font.equals(((PdfDictionary)object).getAsName(PdfName.Type))) {
object.copyTo(targetDocument);
}
}
However, please don't expect that all the content will be preserved on pages etc. This code just does what you ask for - copy the fonts to the new document. Preserving content and references to fonts is much more complicated than just copying the fonts.
Also, don't expect that by copying objects from an arbitrary PDF document to a document that you will make claim to be PDF/A-3B-compliant that document will acquire such compliance. This is simply not true. There are a lot of requirements PDF/A standard imposes and among them there are some requirements for fonts which are not necessarily fulfilled in your original document.

Conditional new Break for multi-column docx file, C#

This is a follow-up question for Creating Word file from ObservableCollection with C#.
I have a .docx file with a Body that has 2 columns for its SectionProperties. I have a dictionary of foreign words with their translation. On each line I need [Word] = [Translation] and whenever a new letter starts it should be in its own line, with 2 or 3 line breaks before and after that letter, like this:
A
A-word = translation
A-word = translation
B
B-word = translation
B-word = translation
...
I structured this in a for loop, so that in every iteration I'm creating a new paragraph with a possible Run for the letter (if a new one starts), a Run for the word and a Run for the translation. So the Run with the first letter is in the same Paragraph as the word and translation Run and it appends 2 or 3 Break objects before and after the Text.
In doing so the second column can sometimes start with 1 or 2 empty lines. Or the first column on the next page can start with empty lines.
This is what I want to avoid.
So my question is, can I somehow check if the end of the page is reached, or the text is at the top of the column, so I don't have to add a Break? Or, can I format the Column itself so that it doesn't start with an empty line?
I have tried putting the letter Run in a separate, optional, Paragraph, but again, I find myself having to input line breaks and the problem remains.
In the spirit of my other answer you can extend the template capability.
Use the Productivity tool to generate a single page break object, something like:
private readonly Paragraph PageBreakPara = new Paragraph(new Run(new Break() { Type = BreakValues.Page}));
Make a helper method that finds containers of a text tag:
public IEnumerable FindElements(OpenXmlCompositeElement searchParent, string tagRegex)
where T: OpenXmlElement
{
var regex = new Regex(tagRegex);
return searchParent.Descendants()
.Where(e=>(!(e is OpenXmlCompositeElement)
&& regex.IsMatch(e.InnerText)))
.SelectMany(e =>
e.Ancestors()
.OfType<T>()
.Union(e is T ? new T[] { (T)e } : new T[] {} ))
.ToList(); // can skip, prevents reevaluations
}
And another one that duplicates a range from the document and deletes range:
public IEnumerable<T> DuplicateRange<T>(OpenXmlCompositeElement root, string tagRegex)
where T: OpenXmlElement
{
// tagRegex must describe exactly two tags, such as [pageStart] and [pageEnd]
// or [page] [/page] - or whatever pattern you choose
var tagElements = FindElements(root, tagRegex);
var fromEl = tagElements.First();
var toEl = tagElements.Skip(1).First(); // throws exception if less than 2 el
// you may want to find a common parent here
// I'll assume you've prepared the template so the elements are siblings.
var result = new List<OpenXmlElement>();
var step = fromEl.NextSibling();
while (step !=null && toEl!=null && step!=toEl){
// another method called DeleteRange will instead delete elements in that range within this loop
var copy = step.CloneNode();
toEl.InsertAfterSelf(copy);
result.Add(copy);
step = step.NextSibling();
}
return result;
}
public IEnumerable<OpenXmlElement> ReplaceTag(OpenXmlCompositeElement parent, string tagRegex, string replacement){
var replaceElements = FindElements<OpenXmlElement>(parent, tagRegex);
var regex = new Regex(tagRegex);
foreach(var el in replaceElements){
el.InnerText = regex.Replace(el.InnerText, replacement);
}
return replaceElements;
}
Now you can have a document that looks like this:
[page]
[TitleLetter]
[WordTemplate][Word]: [Translation] [/WordTemplate]
[pageBreak]
[/page]
With that document you can duplicate the [page]..[/page] range, process it per letter and once you're out of letters - delete the template range:
var vocabulary = Dictionary>;
foreach (var letter in vocabulary.Keys.OrderByDescending(c=>c)){
// in reverse order because the copy range comes after the template range
var pageTemplate = DuplicateRange(wordDocument,"\\[/?page\\]");
foreach (var p in pageTemplate.OfType<OpenXmlCompositeElement>()){
ReplaceTag(p, "[TitleLetter]",""+letter);
var pageBr = ReplaceTag(p, "[pageBreak]","");
if (pageBr.Any()){
foreach(var pbr in pageBr){
pbr.InsertAfterSelf(PageBreakPara.CloneNode());
}
}
var wordTemplateFound = FindElements(p, "\\[/?WordTemplate\\]");
if (wordTemplateFound .Any()){
foreach (var word in vocabulary[letter].Keys){
var wordTemplate = DuplicateRange(p, "\\[/?WordTemplate\\]")
.First(); // since it's a single paragraph template
ReplaceTag(wordTemplate, "\\[/?WordTemplate\\]","");
ReplaceTag(wordTemplate, "\\[Word]",word);
ReplaceTag(wordTemplate, "\\[Translation\\]",vocabulary[letter][word]);
}
}
}
}
...Or something like it.
Look into SdtElements if things start getting too complicated
Don't use AltChunk despite the popularity of that answer, it requires Word to open and process the file, so you can't use some library to make a PDF out of it
Word documents are messy, the solution above should work (haven't tested) but the template must be carefully crafted, make backups of your template often
making a robust document engine isn't easy (since Word is messy), do the minimum you need and rely on the template being in your control (not user-editable).
the code above is far from optimized or streamlined, I've tried to condense it in the smallest footprint possible at the cost of presentability. There are probably bugs too :)

ObjectListView advanced search and filtering

I'm trying to search and filter results on a TreeListView object from the ObjectListView component. Currently, I'm implementing this into a C# (.NET 4.0) project which have the following classes
MyAbstract, MyDir (inherit MyAbstract) and MyFile (inherit MyAbstract as well). These classes have the following properties: Name, Title, Speed, SpeedType.
I want to know how to correctly create a query-like filter to this list, such as for example:
Speed < 10 OR SpeedType == "RPM"
I probably might use LINQ to it, but my main problem is how to apply and manage this using the TreeListView. My main questions are:
How to create this kind of filtering on the TreeListView?
How to make the TreeListView display only the filtered results
How to make it save the original list to have a clear filter button.
This is how I currently setup my list:
public void Init()
{
Project.LoadDirectory();
treeListView1.SetObjects(new object[] { Project.Root });
treeListView1.CanExpandGetter = delegate(object x)
{
return (x is MyDir);
};
treeListView1.ChildrenGetter = delegate(object x)
{
return ((MyDir)x).Nodes;
};
olvColumn1.ImageGetter = new ImageGetterDelegate(this.TreeViewImageGetter);
}
I've looked over the documentation but it stills unclear to me.
What have you tried?
This will filter the TreeListView to only show MyFile objects that match the condition you gave in your question:
this.treeListView.ModelFilter = new ModelFilter(delegate(object x) {
var myFile = x as MyFile;
return x != null && (myFile.Speed < 10 || myFile.SpeedType == "RPM");
});
To stop filtering, just clear the file again:
this.treeListView.ModelFilter = null;
The demo that comes with the project shows all this behaviour.

How to remove last children from stack panel in WPF?

I am adding children to my stackpanel dynamically. What I need is, I want to remove the last children for certain scenario. Is there any option to get last children?
Here is my code:
var row = new somecontrol();
stackpanel.Children.Add(row);
Is there any possible way to remove children.lastOrDefault()?
stackpanel.Children.Last();
Any help would be appreciated. Thanks in advance.
How about:
if(stackpanel.Children.Count != 0)
stackpanel.Children.RemoveAt(stackpanel.Children.Count - 1);
...or if you want to use Linq, just use the OfType<> ExtensionMethod. Then you can do whatever with Linq you wish, like LastOrDefault:
var child = stackpanel.Children.OfType<UIElement>().LastOrDefault();
if(child != null)
stackpanel.Children.Remove(child);
But, the first is probably fastest.
Or you can make your own Extension method, if you want:
class PanelExtensions
{
public static void RemoveLast(this Panel panel)
{
if(panel.Children.Count != 0)
panel.Children.RemoveAt(panel.Children.Count - 1);
}
}
Use like this
stackpanel.Children.RemoveLast();
But Like Xeun mentions an MVVM solution with Bindings would be preferable.

OpenXml: Worksheet Child Elements change in ordering results in a corrupt file

I am trying to use openxml to produce automated excel files. One problem I am facing is to accomodate my object model with open xml object model for excel. I have to come to a point where I realise that the order in which I append the child elements for a worksheet matters.
For Example:
workSheet.Append(sheetViews);
workSheet.Append(columns);
workSheet.Append(sheetData);
workSheet.Append(mergeCells);
workSheet.Append(drawing);
the above ordering doesnot give any error.
But the following:
workSheet.Append(sheetViews);
workSheet.Append(columns);
workSheet.Append(sheetData);
workSheet.Append(drawing);
workSheet.Append(mergeCells);
gives an error
So this doesn't let me to create a drawing object whenever I want to and append it to the worksheet. Which forces me to create these elements before using them.
Can anyone tell me if I have understood the problem correctly ? Because I believe we should be able to open any excel file create a new child element for a worksheet if necessary and append it. But now this might break the order in which these elements are supposed to be appended.
Thanks.
According to the Standard ECMA-376 Office Open XML File Formats, CT_Worksheet has a required sequence:
The reason the following is crashing:
workSheet.Append(sheetViews);
workSheet.Append(columns);
workSheet.Append(sheetData);
workSheet.Append(drawing);
workSheet.Append(mergeCells);
Is because you have drawing before mergeCells. As long as you append your mergeCells after drawing, your code should work fine.
Note: You can find the full XSD in ECMA-376 3rd edition Part 1 (.zip) -> OfficeOpenXML-XMLSchema-Strict -> sml.xsd.
I found that for all "Singleton" children where the parent objects has a Property defined (such as Worksheet.sheetViews) use the singleton property and assign the new object to that instead of using "Append" This causes the class itself to ensure the order is correct.
workSheet.Append(sheetViews);
workSheet.Append(columns);
workSheet.Append(sheetData); // bad idea(though it does work if the order is good)
workSheet.Append(drawing);
workSheet.Append(mergeCells);
More correct format...
workSheet.sheetViews=sheetViews; // order doesn't matter.
workSheet.columns=columns;
...
As Joe Masilotti already explained, the order is defined in the schema.
Unfortunately, the OpenXML library does not ensure the correct order of child elements in the serialized XML as required by the underlying XML schema. Applications may not be able to parse the XML successfully if the order is not correct.
Here is a generic solution which I am using in my code:
private T GetOrCreateWorksheetChildCollection<T>(Spreadsheet.Worksheet worksheet)
where T : OpenXmlCompositeElement, new()
{
T collection = worksheet.GetFirstChild<T>();
if (collection == null)
{
collection = new T();
if (!worksheet.HasChildren)
{
worksheet.AppendChild(collection);
}
else
{
// compute the positions of all child elements (existing + new collection)
List<int> schemaPositions = worksheet.ChildElements
.Select(e => _childElementNames.IndexOf(e.LocalName)).ToList();
int collectionSchemaPos = _childElementNames.IndexOf(collection.LocalName);
schemaPositions.Add(collectionSchemaPos);
schemaPositions = schemaPositions.OrderBy(i => i).ToList();
// now get the index where the position of the new child is
int index = schemaPositions.IndexOf(collectionSchemaPos);
// this is the index to insert the new element
worksheet.InsertAt(collection, index);
}
}
return collection;
}
// names and order of possible child elements according to the openXML schema
private static readonly List<string> _childElementNames = new List<string>() {
"sheetPr", "dimension", "sheetViews", "sheetFormatPr", "cols", "sheetData",
"sheetCalcPr", "sheetProtection", "protectedRanges", "scenarios", "autoFilter",
"sortState", "dataConsolidate", "customSheetViews", "mergeCells", "phoneticPr",
"conditionalFormatting", "dataValidations", "hyperlinks", "printOptions",
"pageMargins", "pageSetup", "headerFooter", "rowBreaks", "colBreaks",
"customProperties", "cellWatches", "ignoredErrors", "smartTags", "drawing",
"drawingHF", "picture", "oleObjects", "controls", "webPublishItems", "tableParts",
"extLst"
};
The method always inserts the new child element at the correct position, ensuring that the resulting document is valid.
For those end up here via Google like I did, the function below solves the ordering problem after the child element is inserted:
public static T ReorderChildren<T>(T element) where T : OpenXmlElement
{
Dictionary<Type, int> childOrderHashTable = element.GetType()
.GetCustomAttributes()
.Where(x => x is ChildElementInfoAttribute)
.Select( (x, idx) => new KeyValuePair<Type, int>(((ChildElementInfoAttribute)x).ElementType, idx))
.ToDictionary(x => x.Key, x => x.Value);
List<OpenXmlElement> reorderedChildren = element.ChildElements
.OrderBy(x => childOrderHashTable[x.GetType()])
.ToList();
element.RemoveAllChildren();
element.Append(reorderedChildren);
return element;
}
The generated types in the DocumentFormat.OpenXml library have custom attributes that can be used to reflect metadata from the the OOXML schema. This solution relies on System.Reflection and System.Linq (i.e., not very fast) but eliminates the need to hardcode a list of strings to correctly order the child elements for a specific type.
I use this function after validation on the ValidationErrorInfo.Node property and it and cleans up the newly created element by reference. That way I don't have apply this method recursively across an entire document.
helb's answer is beautiful - thank you for that, helb.
It has the slight drawback that it does not test if there are already problems with the order of child elements. The following slight modification makes sure there are no pre-existing problems when adding a new element (you still need his _childElementNames, which is priceless) and it's slightly more efficient:
private static int getChildElementOrderIndex(OpenXmlElement collection)
{
int orderIndex = _childElementNames.IndexOf(collection.LocalName);
if( orderIndex < 0)
throw new InvalidOperationException($"Internal: worksheet part {collection.LocalName} not found");
return orderIndex;
}
private static T GetOrCreateWorksheetChildCollection<T>(Worksheet worksheet) where T : OpenXmlCompositeElement, new()
{
T collection = worksheet.GetFirstChild<T>();
if (collection == null)
{
collection = new T();
if (!worksheet.HasChildren)
{
worksheet.AppendChild(collection);
}
else
{
int collectionSchemaPos = getChildElementOrderIndex(collection);
int insertPos = 0;
int lastOrderNum = -1;
for(int i=0; i<worksheet.ChildElements.Count; ++i)
{
int thisOrderNum = getChildElementOrderIndex(worksheet.ChildElements[i]);
if(thisOrderNum<=lastOrderNum)
throw new InvalidOperationException($"Internal: worksheet parts {_childElementNames[lastOrderNum]} and {_childElementNames[thisOrderNum]} out of order");
lastOrderNum = thisOrderNum;
if( thisOrderNum < collectionSchemaPos )
++insertPos;
}
// this is the index to insert the new element
worksheet.InsertAt(collection, insertPos);
}
}
return collection;
}

Categories