Attribute extraction from itext.layout.element in c# - c#

Currently, I am working on a project to generate a Table of Content for pdf using itext What I have is a list of elements (itext.layout.Element objects).
I have created a dictionary< string, int> where we store (chapter title and start page number). I want to consider
<p class="Heading2ANOC" > paragraphs whose Class is Heading2ANOC are chapters title
Mycode :
var toc = new Dictionary<string,int>();
foreach (IElement element in elements)
{
Console.WriteLine(element.GetType().Name);
if (element.GetType().Name == "HtmlPageBreak")
{
continue;
}
else if (element.GetType().Name == "Paragraph")//need a method to check wheather the class is "Heading2ANOC" {
int count=pdf.GetNumberOfPages();
toc.Add("section" + i, count);//
i++;
}
document.Add((IBlockElement)element);
}
I am getting Elements by using follwing code :
string path = "path for the Html";
string html = File.ReadAllText(path);
IList<IElement> elements = HtmlConverter.ConvertToElements(html);
example Html element:
<div style="mso-element: para-border-div; border: solid #A6A6A6 2.25pt; padding: 3.0pt 4.0pt 3.0pt 4.0pt; background: #D9D9D9;">
<p class="Heading2ANOC"><span style="mso-bookmark: _Toc190800487;"><span style="mso-bookmark: _Toc377720650;"><span style="mso-bookmark: _Toc396995390;"><span style="font-size: 11.0pt; font-family: 'Open Sans',sans-serif; color: black; mso-color-alt: windowtext;">SECTION 1 <span style="mso-tab-count: 1;"> </span>Name of the section</span></span></span></span></p>
</div>

There is a cleaner (and more flexible) way to approach the task compared to the approach you are taking now, but it requires writing more code. Fortunately, the code is pretty basic.
To understand what needs to be customized, you need to understand how pdfHTML works a bit. Roughly speaking, it traverses the DOM tree in DFS order and converts the DOM tree into the element tree. Each tag is traversed by a tag worker and that tag worker produces an element as a result. The elements are flexible enough to contain any custom properties (as long as you use a unique property ID not used by iText), so you can set those properties in tag workers and use them later on. In this case you want to pass along class property/attribute.
First off, let's create a custom tag worker deriving from PTagWorker that will process all the paragraphs in the HTML and set a custom property:
public static readonly int CUSTOM_PROPERTY_ID = -10;
private class CustomPTagWorker : PTagWorker {
public CustomPTagWorker(IElementNode element, ProcessorContext context) : base(element, context) {
}
public override void ProcessEnd(IElementNode element, ProcessorContext context) {
base.ProcessEnd(element, context);
IPropertyContainer elementResult = GetElementResult();
if (elementResult != null && !String.IsNullOrEmpty(element.GetAttribute(AttributeConstants.CLASS))) {
elementResult.SetProperty(CUSTOM_PROPERTY_ID, element.GetAttribute(AttributeConstants.CLASS));
}
}
}
Then we need to use that tag worker somehow - for that we create a custom tag worker factory:
private class CustomTagWorkerFactory : DefaultTagWorkerFactory {
public override ITagWorker GetCustomTagWorker(IElementNode tag, ProcessorContext context) {
if (TagConstants.P.Equals(tag.Name().ToLower())) {
return new CustomPTagWorker(tag, context);
}
return base.GetCustomTagWorker(tag, context);
}
}
All we need to do now is make pdfHTML aware of those customizations by passing the custom tag worker in the converter properties:
ConverterProperties properties = new ConverterProperties().SetTagWorkerFactory(new CustomTagWorkerFactory());
To test it out, we can iterate over the elements and check for the presence of our custom property (instead of checking for names of the classes):
String html = "<p class=\"Heading2ANOC\">hello</p><p>world</p>";
ConverterProperties properties = new ConverterProperties().SetTagWorkerFactory(new CustomTagWorkerFactory());
IList<IElement> elements = HtmlConverter.ConvertToElements(html, properties);
foreach (IElement element in elements)
{
if (element.HasProperty(CUSTOM_PROPERTY_ID)) {
String propertyValue = element.GetProperty<String>(CUSTOM_PROPERTY_ID);
Console.WriteLine(propertyValue);
}
}
Please bear in mind that for more complicated HTMLs where elements nest into each other you might want to perform the final analysis in a different way, e.g.
foreach (IElement element in elements)
{
if (element is AbstractElement<Div>) {
var children = (element as AbstractElement<Div>).GetChildren();
// analyze children
}
}

Related

How to implement asp-append-version="true" to background-image property?

I am trying to implement the HTMLTagHelper asp-append-version="true" to my images.
The problem is as regards the DOM distribution, I am not assigning the attribute to an <img> tag but to to a <div> containing the image with the background-url property.
Moreover, the div is generated before all the DOM is loaded and I don't know if there would be a different approach of doing it.
One is obvious, change the div to an img tag, but I don't want it as my design has to remain the same.
My javascript has hitherto been like this:
cardHTML += '<div asp-append-version="true" class="card littleCard" style="background-image: url(/Content/Img/Especialistas/LittleCard/' + especialista.idEspecialista + '.jpg' + ')' + '" >';
cardHTML += '</div>';
The asp-append-version="true" won't work on the div tag.
Any ideas on how to find an approach of dealing with this ?
Thanks
You can create a custom TagHelper to target all elements having an inline style attribute. The following example I've tried looks working fine but if you want something more standard (similar to ImageTagHelper, ...), you can try looking into the base class UrlResolutionTagHelper. I'm not so sure why it need to be more complicated in there in which basically you need to resolve the URL before actually processing it more. I've tried with a simple IFileVersionProvider and it works for relative paths as well (of course the resolved path should be at the current server's web root).
The following simple example works fine for attribute values of HtmlString (which is almost the usual case, some custom rendering may inject IHtmlContent that is not of HtmlString, for such complicated cases, you can refer to the source code for UrlResolutionTagHelper, even copying almost the exact relevant code there is fine):
//target only elements having an inline style attribute
[HtmlTargetElement(Attributes = "style")]
public class InlineStyleBackgroundElementTagHelper : TagHelper
{
readonly IFileVersionProvider _fileVersionProvider;
const string BACKGROUND_URL_PATTERN = "(background(?:-image)?\\s*:[^;]*url)(\\([^)]+\\))";
public InlineStyleBackgroundElementTagHelper(IFileVersionProvider fileVersionProvider)
{
_fileVersionProvider = fileVersionProvider;
}
//bind the asp-append-version property
[HtmlAttributeName("asp-append-version")]
public bool AppendsVersion { get; set; }
//inject ViewContext from the current request
[HtmlAttributeNotBound]
[ViewContext]
public ViewContext ViewContext { get; set; }
public override void Process(TagHelperContext context, TagHelperOutput output)
{
if (AppendsVersion)
{
if (output.Attributes.TryGetAttribute("style", out var styleAttr))
{
//the value here should be an HtmlString, so this basically
//gets the raw plain string of the style attribute's value
var inlineStyle = styleAttr.Value.ToString();
var basePath = ViewContext.HttpContext.Request.PathBase;
inlineStyle = Regex.Replace(inlineStyle, BACKGROUND_URL_PATTERN, m =>
{
//extract the background url contained in the inline style
var backgroundUrl = m.Groups[2].Value.Trim('(', ')', ' ');
//append the version
var versionedUrl = _fileVersionProvider.AddFileVersionToPath(basePath, backgroundUrl);
//format back the inline style with the versioned url
return $"{m.Groups[1]}({versionedUrl})";
}, RegexOptions.Compiled | RegexOptions.IgnoreCase);
output.Attributes.SetAttribute("style", inlineStyle);
}
}
}
}
Usage: just like how you use the asp-append-version on other built-in tag helps. (like in your example).

Umbraco 7 load content media picker in custom section

I've created a custom section in Umbraco 7 that references external urls, but have a requirement to extend it to use exactly the same functionality as the media picker from the 'Content' rich text editor. I don't need any other rich text functionality other than to load the media picker overlay from an icon, and select either an internal or external url.
I've tried to distil the umbraco source code, as well as trying various adaptations of online tutorials, but as yet I can't get the media picker to load.
I know that fundamentally I need:
Another angular controller to return the data from the content
'getall' method
An html section that contains the media picker overlay
A reference in the edit.html in my custom section to launch the overlay.
However, as yet I haven't been able to wire it all together, so any help much appreciated.
So, this is how I came up with the solution.....
The first win was that I discovered 2 excellent tutorial blog posts, upon the shoulders of which this solution stands, so much respect to the following code cats:
Tim Geyssons - Nibble postings:
http://www.nibble.be/?p=440
Markus Johansson - Enkelmedia
http://www.enkelmedia.se/blogg/2013/11/22/creating-custom-sections-in-umbraco-7-part-1.aspx
Create a model object to represent a keyphrase, which will be associated to a new, simple, ORM table.
The ToString() method allows a friendly name to be output on the front-end.
[TableName("Keyphrase")]
public class Keyphrase
{
[PrimaryKeyColumn(AutoIncrement = true)]
public int Id { get; set; }
public string Name { get; set; }
public string Phrase { get; set; }
public string Link { get; set; }
public override string ToString()
{
return Name;
}
}
Create an Umbraco 'application' that will register the new custom section by implementing the IApplication interface. I've called mine 'Utilities' and associated it to the utilities icon.
[Application("Utilities", "Utilities", "icon-utilities", 8)]
public class UtilitiesApplication : IApplication { }
The decorator allows us to supply a name, alias, icon and sort-order of the new custom section.
Create an Umbraco tree web controller that will allow us to create the desired menu behaviour for our keyphrases, and display the keyphrase collection from our database keyphrase table.
[PluginController("Utilities")]
[Umbraco.Web.Trees.Tree("Utilities", "KeyphraseTree", "Keyphrase", iconClosed: "icon-doc", sortOrder: 1)]
public class KeyphraseTreeController : TreeController
{
private KeyphraseApiController _keyphraseApiController;
public KeyphraseTreeController()
{
_keyphraseApiController = new KeyphraseApiController();
}
protected override TreeNodeCollection GetTreeNodes(string id, FormDataCollection queryStrings)
{
var nodes = new TreeNodeCollection();
var keyphrases = _keyphraseApiController.GetAll();
if (id == Constants.System.Root.ToInvariantString())
{
foreach (var keyphrase in keyphrases)
{
var node = CreateTreeNode(
keyphrase.Id.ToString(),
"-1",
queryStrings,
keyphrase.ToString(),
"icon-book-alt",
false);
nodes.Add(node);
}
}
return nodes;
}
protected override MenuItemCollection GetMenuForNode(string id, FormDataCollection queryStrings)
{
var menu = new MenuItemCollection();
if (id == Constants.System.Root.ToInvariantString())
{
// root actions
menu.Items.Add<CreateChildEntity, ActionNew>(ui.Text("actions", ActionNew.Instance.Alias));
menu.Items.Add<RefreshNode, ActionRefresh>(ui.Text("actions", ActionRefresh.Instance.Alias), true);
return menu;
}
else
{
menu.Items.Add<ActionDelete>(ui.Text("actions", ActionDelete.Instance.Alias));
}
return menu;
}
}
The class decorators and TreeController extension allow us to declare the web controller for our keyphrase tree, associate it to our Utilities custom section, as well as choose an icon and sort order.
We also declare an api controller (we'll get to that!), which will allow us access to our Keyphrase data object.
The GetTreeNodes method allows us to iterate the keyphrase data collection and return the resultant nodes to the view.
The GetMenuNode method allows us to create the menu options we require for our custom section.
We state that if the node is the root (Utilities), then allow us to add child nodes and refresh the node collection.
However, if we are lower in the node tree (Keyphrase) then we only want users to be able to delete the node (ie the user shouldn't be allowed to create another level of nodes deeper than Keyphrase)
Create an api controller for our Keyphrase CRUD requests
public class KeyphraseApiController : UmbracoAuthorizedJsonController
{
public IEnumerable<Keyphrase> GetAll()
{
var query = new Sql().Select("*").From("keyphrase");
return DatabaseContext.Database.Fetch<Keyphrase>(query);
}
public Keyphrase GetById(int id)
{
var query = new Sql().Select("*").From("keyphrase").Where<Keyphrase>(x => x.Id == id);
return DatabaseContext.Database.Fetch<Keyphrase>(query).FirstOrDefault();
}
public Keyphrase PostSave(Keyphrase keyphrase)
{
if (keyphrase.Id > 0)
DatabaseContext.Database.Update(keyphrase);
else
DatabaseContext.Database.Save(keyphrase);
return keyphrase;
}
public int DeleteById(int id)
{
return DatabaseContext.Database.Delete<Keyphrase>(id);
}
}
Create the custom section views with angular controllers, which is the current architectual style in Umbraco 7.
It should be noted that Umbraco expects that your custom section components are put into the following structure App_Plugins//BackOffice/
We need a view to display and edit our keyphrase name, target phrase and url
<form name="keyphraseForm"
ng-controller="Keyphrase.KeyphraseEditController"
ng-show="loaded"
ng-submit="save(keyphrase)"
val-form-manager>
<umb-panel>
<umb-header>
<div class="span7">
<umb-content-name placeholder=""
ng-model="keyphrase.Name" />
</div>
<div class="span5">
<div class="btn-toolbar pull-right umb-btn-toolbar">
<umb-options-menu ng-show="currentNode"
current-node="currentNode"
current-section="{{currentSection}}">
</umb-options-menu>
</div>
</div>
</umb-header>
<div class="umb-panel-body umb-scrollable row-fluid">
<div class="tab-content form-horizontal" style="padding-bottom: 90px">
<div class="umb-pane">
<umb-control-group label="Target keyphrase" description="Keyphrase to be linked'">
<input type="text" class="umb-editor umb-textstring" ng-model="keyphrase.Phrase" required />
</umb-control-group>
<umb-control-group label="Keyphrase link" description="Internal or external url">
<p>{{keyphrase.Link}}</p>
<umb-link-picker ng-model="keyphrase.Link" required/>
</umb-control-group>
<div class="umb-tab-buttons" detect-fold>
<div class="btn-group">
<button type="submit" data-hotkey="ctrl+s" class="btn btn-success">
<localize key="buttons_save">Save</localize>
</button>
</div>
</div>
</div>
</div>
</div>
</umb-panel>
</form>
This utilises umbraco and angular markup to display data input fields dynamically and associate our view to an angular controller that interacts with our data layer
angular.module("umbraco").controller("Keyphrase.KeyphraseEditController",
function ($scope, $routeParams, keyphraseResource, notificationsService, navigationService) {
$scope.loaded = false;
if ($routeParams.id == -1) {
$scope.keyphrase = {};
$scope.loaded = true;
}
else {
//get a keyphrase id -> service
keyphraseResource.getById($routeParams.id).then(function (response) {
$scope.keyphrase = response.data;
$scope.loaded = true;
});
}
$scope.save = function (keyphrase) {
keyphraseResource.save(keyphrase).then(function (response) {
$scope.keyphrase = response.data;
$scope.keyphraseForm.$dirty = false;
navigationService.syncTree({ tree: 'KeyphraseTree', path: [-1, -1], forceReload: true });
notificationsService.success("Success", keyphrase.Name + " has been saved");
});
};
});
Then we need html and corresponding angular controller for the keyphrase delete behaviour
<div class="umb-pane" ng-controller="Keyphrase.KeyphraseDeleteController">
<p>
Are you sure you want to delete {{currentNode.name}} ?
</p>
<div>
<div class="umb-pane btn-toolbar umb-btn-toolbar">
<div class="control-group umb-control-group">
<a href="" class="btn btn-link" ng-click="cancelDelete()"
<localize key="general_cancel">Cancel</localize>
</a>
<a href="" class="btn btn-primary" ng-click="delete(currentNode.id)">
<localize key="general_ok">OK</localize>
</a>
</div>
</div>
</div>
</div>
Utilise Umbraco's linkpicker to allow a user to select an internal or external url.
We need html markup to launch the LinkPicker
<div>
<ul class="unstyled list-icons">
<li>
<i class="icon icon-add blue"></i>
<a href ng-click="openLinkPicker()" prevent-default>Select</a>
</li>
</ul>
</div>
And an associated directive js file that launches the link picker and posts the selected url back to the html view
angular.module("umbraco.directives")
.directive('umbLinkPicker', function (dialogService, entityResource) {
return {
restrict: 'E',
replace: true,
templateUrl: '/App_Plugins/Utilities/umb-link-picker.html',
require: "ngModel",
link: function (scope, element, attr, ctrl) {
ctrl.$render = function () {
var val = parseInt(ctrl.$viewValue);
if (!isNaN(val) && angular.isNumber(val) && val > 0) {
entityResource.getById(val, "Content").then(function (item) {
scope.node = item;
});
}
};
scope.openLinkPicker = function () {
dialogService.linkPicker({ callback: populateLink });
}
scope.removeLink = function () {
scope.node = undefined;
updateModel(0);
}
function populateLink(item) {
scope.node = item;
updateModel(item.url);
}
function updateModel(id) {
ctrl.$setViewValue(id);
}
}
};
});
There is one final js file that allows us to send data across the wire, with everyone's favourite http verbs GET, POST(handles put too here too) and DELETE
angular.module("umbraco.resources")
.factory("keyphraseResource", function ($http) {
return {
getById: function (id) {
return $http.get("BackOffice/Api/KeyphraseApi/GetById?id=" + id);
},
save: function (keyphrase) {
return $http.post("BackOffice/Api/KeyphraseApi/PostSave", angular.toJson(keyphrase));
},
deleteById: function (id) {
return $http.delete("BackOffice/Api/KeyphraseApi/DeleteById?id=" + id);
}
};
});
In addition, we will need a package manifest to register our javascript behaviour
{
javascript: [
'~/App_Plugins/Utilities/BackOffice/KeyphraseTree/edit.controller.js',
'~/App_Plugins/Utilities/BackOffice/KeyphraseTree/delete.controller.js',
'~/App_Plugins/Utilities/keyphrase.resource.js',
'~/App_Plugins/Utilities/umbLinkPicker.directive.js'
]
}
Implement tweaks to allow the CMS portion of the solution to work correctly.
At this point we've almost got our custom section singing, but we just need to jump a couple more Umbraco hoops, namely
a) add a keyphrase event class that creates our keyphrase db table if it doesn't exist (see point 8)
b) fire up Umbraco and associate the new custom section to the target user (from the User menu)
c) alter the placeholder text for the custom section by searching for it in umbraco-->config-->en.xml and swapping out the placeholder text for 'Utilities'
Intercept target content fields of target datatypes when content is saved or published
The requirement I was given was to intercept the body content of a news article, so you'll need to create a document type in Umbraco that has, for example, a title field of type 'Textstring', and bodyContent field of type 'Richtext editor'.
You'll also want a, or many, keyphrase(s) to target, which should now be in a new Umbraco custom section, 'Utilities'
Here I've targeted the keyphrase 'technology news' to link to the bbc technology news site so that any time I write the phrase 'technology news' the href link will be inserted automatically.
This is obviously quite a simple example, but would be quite powerful if a user needed to link to certain repetitive legal documents, for example tax, property, due dilligence, for example, which could be hosted either externally or within the CMS itself. The href link will open an external resource in a new tab, and internal resource in the same window (we'll get to that in Point 9)
So, the principle of what we're trying to achieve is to intercept the Umbraco save event for a document and manipulate our rich text to insert our link. This is done as follows:
a) Establish a method (ContentServiceOnSaving) that will fire when a user clicks 'save', or 'publish and save'.
b) Target our desired content field to find our keyphrases.
c) Parse the target content html against our keyphrase collection to create our internal/external links.
NB: If you just want to get the custom section up and running, you only need the ApplicationStarted method to create the KeyPhrase table.
public class KeyphraseEvents : ApplicationEventHandler
{
private KeyphraseApiController _keyphraseApiController;
protected override void ApplicationStarted(UmbracoApplicationBase umbracoApplication,
ApplicationContext applicationContext)
{
_keyphraseApiController = new KeyphraseApiController();
ContentService.Saving += ContentServiceOnSaving;
var db = applicationContext.DatabaseContext.Database;
if (!db.TableExist("keyphrase"))
{
db.CreateTable<Keyphrase>(false);
}
}
private void ContentServiceOnSaving(IContentService sender, SaveEventArgs<IContent> saveEventArgs)
{
var keyphrases = _keyphraseApiController.GetAll();
var keyphraseContentParser = new KeyphraseContentParser();
foreach (IContent content in saveEventArgs.SavedEntities)
{
if (content.ContentType.Alias.Equals("NewsArticle"))
{
var blogContent = content.GetValue<string>("bodyContent");
var parsedBodyText = keyphraseContentParser.ReplaceKeyphrasesWithLinks(blogContent, keyphrases);
content.SetValue("bodyContent", parsedBodyText);
}
}
}
}
The ContentServiceOnSaving method allows us to intercept any save event in Umbraco. Afterwhich we check our incoming content to see if it's of the type we're expecting - in this example 'NewsArticle' - and if it is, then target the 'bodyContent' section, parse this with our 'KeyphraseContentParser', and swap the current 'bodyContent' with the parsed 'bodyContent'.
Create a Keyphrase parser to swap keyphrases for internal/external links
public class KeyphraseContentParser
{
public string ReplaceKeyphrasesWithLinks(string htmlContent, IEnumerable<Keyphrase> keyphrases)
{
var parsedHtmlStringBuilder = new StringBuilder(htmlContent);
foreach (var keyphrase in keyphrases)
{
if (htmlContent.CaseContains(keyphrase.Phrase, StringComparison.OrdinalIgnoreCase))
{
var index = 0;
do
{
index = parsedHtmlStringBuilder.ToString()
.IndexOf(keyphrase.Phrase, index, StringComparison.OrdinalIgnoreCase);
if (index != -1)
{
var keyphraseSuffix = parsedHtmlStringBuilder.ToString(index, keyphrase.Phrase.Length + 4);
var keyPhraseFromContent = parsedHtmlStringBuilder.ToString(index, keyphrase.Phrase.Length);
var keyphraseTarget = "_blank";
if (keyphrase.Link.StartsWith("/"))
{
keyphraseTarget = "_self";
}
var keyphraseLinkReplacement = String.Format("<a href='{0}' target='{1}'>{2}</a>",
keyphrase.Link, keyphraseTarget, keyPhraseFromContent);
if (!keyphraseSuffix.Equals(String.Format("{0}</a>", keyPhraseFromContent)))
{
parsedHtmlStringBuilder.Remove(index, keyPhraseFromContent.Length);
parsedHtmlStringBuilder.Insert(index, keyphraseLinkReplacement);
index += keyphraseLinkReplacement.Length;
}
else
{
var previousStartBracket = parsedHtmlStringBuilder.ToString().LastIndexOf("<a", index);
var nextEndBracket = parsedHtmlStringBuilder.ToString().IndexOf("a>", index);
parsedHtmlStringBuilder.Remove(previousStartBracket, (nextEndBracket - (previousStartBracket - 2)));
parsedHtmlStringBuilder.Insert(previousStartBracket, keyphraseLinkReplacement);
index = previousStartBracket + keyphraseLinkReplacement.Length;
}
}
} while (index != -1);
}
}
return parsedHtmlStringBuilder.ToString();
}
}
It's probably easiest to step through the above code, but fundamentally the parser has to:
a) find and wrap all keyphrases, ignoring case, with a link to an internal CMS, or external web resource.
b) handle an already parsed html string to both leave links in place and not create nested links.
c) allow CMS keyphrase changes to be updated in the parsed html string.
The blog of this, as well as the github code can be found from the links in the previous post.
Ok, so after finding some excellent helper posts and digging around I came up with the solution, which I've written about here:
http://frazzledcircuits.blogspot.co.uk/2015/03/umbraco-7-automatic-keyphrase.html
And the source code is here:
https://github.com/AdTarling/UmbracoSandbox

Duplicating content on save for a multilingual umbraco site

[Edit] I have actually been allowed to use the doc names, which makes it much easier but I still think it would be interesting to find out if it is possible.
I have to set a trigger to duplicate content to different branches on the content tree as the site will be in several languages. I have been told that I cannot access the documents by name(as they may change) and I shouldn't use node IDs either(not that I would know how to, after a while it would become difficult to follow the structure).
How can I traverse the tree to insert the new document in the relevant sub branches in the other languages? Is there a way?
You can use the Document.AfterPublish event to catch the specific document object after it's been published. I would use this event handler to check the node type alias is one that you want copied, then you can call Document.MakeNew and pass the node ID of the new location.
This means you don't have to use a specific node ID or document name to trap an event.
Example:
using umbraco.cms.businesslogic.web;
using umbraco.cms.businesslogic;
using umbraco.BusinessLogic;
namespace MyWebsite {
public class MyApp : ApplicationBase {
public MyApp()
: base() {
Document.AfterPublish += new Document.PublishEventHandler(Document_AfterPublish);
}
void Document_AfterPublish(Document sender, PublishEventArgs e) {
if (sender.ContentType.Alias == "DoctypeAliasOfDocumentYouWantToCopy") {
int parentId = 0; // Change to the ID of where you want to create this document as a child.
Document d = Document.MakeNew("Name of new document", DocumentType.GetByAlias(sender.ContentType.Alias), User.GetUser(1), parentId)
foreach (var prop in sender.GenericProperties) {
d.getProperty(prop.PropertyType.Alias).Value = sender.getProperty(prop.PropertyType.Alias).Value;
}
d.Save();
d.Publish(User.GetUser(1));
}
}
}
}

Multi Level Tree in Umbraco Custom Section

I'm currently trying to create a custom tree and I'm running into trouble when trying to render a nodes children. After browsing various articles/posts I'm at this point:
public override void Render(ref XmlTree tree)
{
List<Node> articles = NodeUtil.GetAllNodesOfDocumentType(-1, "Promoter");
Node article = articles.Where(p => p.CreatorID == UmbracoEnsuredPage.CurrentUser.Id).FirstOrDefault();
if(promo != null)
{
var dNode = XmlTreeNode.Create(this);
dNode.NodeID = article.Id.ToString();
dNode.Text = article.Name;
dNode.Icon = "doc.gif";
dNode.Action = "javascript:openArticle(" + article.Id + ")";
dNode.Source = article.Children.Count > 0 ? this.GetTreeServiceUrl("" + article.Id) : "";
tree.Add(dNode);
}
}
The code above gets the article belonging to the current user (for the sake of testing, each user only has one article at the moment). I then attempt to print out the children of this article but instead of getting the desired output, I get the follwowing:
Article Name
- Article Name
- Article Name
- Article Name
Each time I expand a node, it just seems to render the same node, and goes on and on.
I've seen other ways of using the treeservice, like:
TreeService treeService = new TreeService(...);
node.Source = treeService.GetServiceUrl();
But I get an error saying there is no GetServiceUrl method that takes 0 arguments. I assume the method above was for earlier versions?
It took me a while to work this out. Here is the solution, hope it would help someone.
const string PARENT_ID = "10"; // The ID of the node that has child nodes
public override void Render(ref XmlTree tree)
{
if (this.NodeKey == PARENT_ID) // Rendering the child nodes of the parent folder
{
// Render a child node
XmlTreeNode node = XmlTreeNode.Create(this);
node.NodeID = "11";
node.Text = "child";
node.Icon = "doc.gif";
node.Action = ...
tree.Add(node);
}
else // Default (Rendering the root)
{
// Render the parent folder
XmlTreeNode node = XmlTreeNode.Create(this);
node.NodeID = PARENT_ID;
node.Source = this.GetTreeServiceUrl(node.NodeID);
node.Text = "parent";
node.Icon = "folder.gif";
tree.Add(node);
}
}
The output suggests that the node tree you're building is nesting each child node - this is because the nodeId is being reset to -1 with each pass.
This post on our.umbraco.org describes the same problem, and suggests that you use NodeKey instead of ID to move between nodes.
**
Not necessarily helpful but I would use the uQuery language extensions that comes with the ucomponents package (and who installs Umbraco without ucomponents?), to simplify the method calls:
For example:
List<Node> articles = uQuery.getNodesByType("Promoter");
foreach(Node article in articles)
{
List<Node> children = article.GetDescendantNodes();
... build tree
}

How can I parse this HTML to get the content I want?

I am currently trying to parse an HTML document to retrieve all of the footnotes inside of it; the document contains dozens and dozens of them. I can't really figure out the expressions to use to extract all of content I want. The thing is, the classes (ex. "calibre34") are all randomized in every document. The only way to see where the footnotes are located is to search for "hide" and it's always text afterwards and is closed with a < /td> tag. Below is an example of one of the footnotes in the HTML document, all I want is the text. Any ideas? Thanks guys!
<td class="calibre33">1.<span><a class="x-xref" href="javascript:void(0);">
[hide]</a></span></td>
<td class="calibre34">
Among the other factors on which the premium would be based are the
average size of the losses experienced, a margin for contingencies,
a loading to cover the insurer's expenses, a margin for profit or
addition to the insurer's surplus, and perhaps the investment
earnings the insurer could realize from the time the premiums are
collected until the losses must be paid.</td>
Use HTMLAgilityPack to load the HTML document and then extract the footnotes with this XPath:
//td[text()='[hide]']/following-sibling::td
Basically,what it does is first selecting all td nodes that contain [hide] and then finally go to and select their next sibling. So the next td. Once you have this collection of nodes you can extract their inner text (in C#, with the support provided in HtmlAgilityPack).
How about use MSHTML to parse HTML source?
Here is the demo code.enjoy.
public class CHtmlPraseDemo
{
private string strHtmlSource;
public mshtml.IHTMLDocument2 oHtmlDoc;
public CHtmlPraseDemo(string url)
{
GetWebContent(url);
oHtmlDoc = (IHTMLDocument2)new HTMLDocument();
oHtmlDoc.write(strHtmlSource);
}
public List<String> GetTdNodes(string TdClassName)
{
List<String> listOut = new List<string>();
IHTMLElement2 ie = (IHTMLElement2)oHtmlDoc.body;
IHTMLElementCollection iec = (IHTMLElementCollection)ie.getElementsByTagName("td");
foreach (IHTMLElement item in iec)
{
if (item.className == TdClassName)
{
listOut.Add(item.innerHTML);
}
}
return listOut;
}
void GetWebContent(string strUrl)
{
WebClient wc = new WebClient();
strHtmlSource = wc.DownloadString(strUrl);
}
}
class Program
{
static void Main(string[] args)
{
CHtmlPraseDemo oH = new CHtmlPraseDemo("http://stackoverflow.com/faq");
Console.Write(oH.oHtmlDoc.title);
List<string> l = oH.GetTdNodes("x");
foreach (string n in l)
{
Console.WriteLine("new td");
Console.WriteLine(n.ToString());
}
Console.Read();
}
}

Categories