Possible to get HtmlNode's position & length within original input? - c#

Consider the following HTML fragment (_ is used for whitespace):
<head>
...
<link ... ___/>
<!-- ... -->
...
</head>
I'm using Html Agility Pack (HAP) to read HTML files/fragments and to strip out links. What I want to do is find the LINK (and some other) elements and then replace them with whitespace, like so:
<head>
...
____________
<!-- ... -->
...
</head>
The parsing part seems to be working so far, I get the nodes I'm looking for. However, HAP tries to fix the HTML content while I need everything to be exactly the same, except for the changes I'm trying to make. Plus, HAP seems to have quite a few bugs when it comes to writing back content that was read in previously, so the approach I want to take is let HAP parse the input and then I go back to the original input and replace content that I don't want.
The problem is, HtmlNode doesn't seem to have an input length property. It has StreamPosition which seems to indicate where reading of the node's content started within the input but I couldn't find a length property that'd tell me how many characters were consumed to build the node.
I tried using the OuterHtml propety but, unfortunately, HAP tries to fix the LINK by removing the ___/ part (a LINK element is not supposed to be closed). Because of this, OuterHtml.Length returns the wrong length.
Is there a way in HAP to get this information?

I ended up modifying the code of HtmlAgilityPack to expose a new property that returns the private _outerlength field of HtmlNode.
public virtual int OuterLength
{
get
{
return ( _outerlength );
}
}
This seems to be working fine so far.

If you want to achieve the same result without recompiling HAP, then use reflection to access the private variable.
I usually wouldn't recommend reflection to access private variables, but I recently had the exact same situation as this and used reflection, because I was unable to use a recompiled version of the assembly. To do this, create a static variable that holds the field info object (to avoid recreating it on every use):
private static readonly FieldInfo HtmlNodeOuterLengthFieldInfo = typeof(HtmlNode).GetField("_outerlength", BindingFlags.NonPublic | BindingFlags.Instance);
Then whenever you want to access the true length of the original outer HTML:
var match = htmlDocument.DocumentNode.SelectSingleNode("xpath");
var htmlLength = (int)HtmlNodeOuterLengthFieldInfo.GetValue(match);

Transformed #Xcalibur's answer into an extension method.
Note that HtmlNode has property OuterLength, but it isn't the same as it's private field _outerlength, which is what we need. (Reading other answers here I first thought that since 2013, HtmlAgilityPack already added the OuterLength as a public property, which they did, but after some testing I noticed it simply returns length of OuterHtml). So we can either rebuild the package from source to expose the field as a public property, or use an extension method with Reflection (which is slow).
Extension method
namespace HtmlAgilityPack
{
public static class HtmlDocumentExtensions
{
private static readonly System.Reflection.FieldInfo HtmlNodeOuterLengthFieldInfo =
typeof(HtmlNode).GetField("_outerlength", System.Reflection.BindingFlags.NonPublic
| System.Reflection.BindingFlags.Instance);
public static int GetOuterLengthInStream(this HtmlNode node) =>
(int)HtmlNodeOuterLengthFieldInfo.GetValue(node ??
throw new System.ArgumentNullException(nameof(node)));
}
}
Because HtmlNode already has property OuterLength, to avoid ambiguity I called the method GetOuterLengthInStream().
Usage
node.GetOuterLengthInStream()

Related

Accessing style attribute individually

I am using HtmlAgilityPack in c#. I created a div element with some attribute like,
HtmlNode div = HtmlNode.CreateNode("<div></div>");
div.Attributes.Add("style","width:100px;height:100px;color:red;position:absolute;");
Now I want to know that is there any method in HtmlAgilityPack by with i can access the style Attributes individually, like we do in jQuery :
$("div").width(); or $("div").css("width");
You could try using CsQuery, which is in fact like jQuery:
CQ div = CQ.Create("<div></div>");
div.CssSet( new {
width="100px",
height="100px",
color="red",
position="absolute"
});
//.. or
div.Css("width","100px").Css( ... ) ...
string width = div.Css("width"); // width=="100px"
int widthInt = div.Css<int>("width"); // widthInt==100
It implements every DOM manipulation method of jQuery, so the API should be very familiar. It also provides an implemention that mostly mimics the browser DOM, e.g.
var nodeName = div[0].NodeName; // nodeName=="DIV";
div[0].AppendChild(div.Document.CeateElement("span")); // add a span child
There are a couple exceptions, CssSet is one of them, where the overloaded methods in javascript didn't work out in C# so a different method had to be used. (The other one is AttrSet when setting from an object). It's also got extensive unit test coverage, including much of the test suite from jQuery ported to C#, and selectors are much faster than HTML Agility Pack (not to mention a lot less confusing since they're just CSS) thanks to a subselect-capable index.

In C#, how to dynamically get a member of a static class?

I am currently trying to develop a mobile barcode reader in Windows Phone 7.5 using the ZXing library.
Seeing that I am posting here, you might already have guessed that I am facing some kind of problem that I don't know any solution to.
The problem is the following:
The ZXing library allows for multiple barcode formats - however, I'd like to include a settings menu for the user to focus on one barcode specifically.
The BarcodeFormat-object is static and contains the readonly members (of type BarcodeFormat) that I want to assign.
Seeing and hearing that Reflection is THE powerful weapon for dynamic behaviour like this, I thought I'd give it a try.
So far I have code that gets all the possible formats from ZXing using
MemberInfo[] plist = typeof(BarcodeFormat).GetMembers();
That works for getting the names of the formats, meaning I can successfully show the names in a list.
But I am running into a problem when trying to assign these formats to the actual reader, because I only have the MemberInfo and no longer the BarcodeFormat.
So far I have only found examples where the user wanted to access (set / get) variables dynamically.
The proposed solutions however did not seem to fit my problem - at least I didn't find any way to assign the format in those ways.
Any help would be great :)
Thank you very much.
EDIT:
The BarcodeFormat is used like this:
WP7BarcodeManager.ScanMode = BarcodeFormat.ITF;
In this example, only barcodes in the ITF (interleaved 2 out of 5) format would be accepted.
I have so far tried the following approaches.
Simply assign the MemberInfo object instead of the original BarcodeFormat object.
Cast the MemberInfo object to BarcodeFormat.
I tried to use FieldInfo and getValue, however it seems that I have to create an example object and assign a value to the needed field in order to get the value. This can't be done here, because the object is static and the field is readonly.
The whole ZXing library is compiled as a DLL that I link my project to. (it seems to be linked correctly, because everything else works). An example declaration of BarcodeFormat looks like this
public static readonly BarcodeFormat ITF
get ITF dynamically:
var formatName = "ITF";
var format = typeof(BarcodeFormat)
.GetProperty(formatName, BindingFlags.Static | BindingFlags.Public)
.GetValue(null, null);
set WP7BarcodeManager.ScanMode:
WP7BarcodeManager.ScanMode = (BarcodeFormat)format;
ps
member to BarcodeFormat:
var _format = member is PropertyInfo
? ((PropertyInfo)member).GetValue(null, null)
: ((FieldInfo)member).GetValue(null);
var format = (BarcodeFormat)_format;
"Because static properties belong to the type, not individual objects, get static properties by passing null as the object argument"
For Example :
PropertyInfo CurCultProp = (typeof(CultureInfo)).GetProperty("CurrentCulture");
Console.WriteLine("CurrCult: " + CurCultProp.GetValue(null,null));
So all you need to do is call GetProperties() instead of GetMembers() and call GetValue(null, null) to get the value.
I don't fully understand why you go through the hassle with reflection.
You can enumerate the bar code types like this (ok dummy code, you should probably bind to a listbox/picker but.. ):
var mgr = new BarcodeTypeManager();
foreach (var barCode in mgr.BarcodeTypes)
{
WP7BarcodeManager.ScanMode = barCode.BarcodeType;
}
(In fact, there's also a BarcodePhotoChooser picker you can use.)
And if the user can save a preferred type, you can easily look it up again:
var typeToUse = mgr.BarcodeTypes.Where(b => b.Name == "what user selected").FirstOrDefault();
WP7BarcodeManager.ScanMode = typeToUse.BarcodeType;

fubumvc - rendering a collection as a drop down list

I'm having trouble understanding how to render a collection as a drop down list.
If I have a model like:
public class AccountViewModel {
public string[] Country { get; set; }
}
I would like the string collection to render as a drop down list.
Using the html page helper InputFor doesn't seem to work. It simply render's a text box.
I've noticed that InputFor can reflect on the property type and render html accordingly. (Like a checkbox for a boolean field).
I also notice that FubuPageExtensions has methods for CheckBoxFor and TextBoxFor, but nothing equivalent to DropDownListFor.
I'm probably missing something quite fundamental in understanding html conventions in fubu.
Do I need to build the select tag myself? If so, what is the recommended approach to do it?
You are correct that (at the time I last looked) there is no FubuMVC.Core HTML extension method for generating select tags although you could use the HtmlTags library to generate a select tag via code.
As you touch upon in your question the correct way to attack this is likely with an HTML convention together with the HtmlTags library such as that demonstrated in the FubuMVC.Recipes example 'src/UI/HtmlConventionsWithPageExtensions'.
For example an enum generation example might be:
this.Editors
.If(e => e.Accessor.PropertyType.IsEnum)
.BuildBy(er =>
{
var tag = new HtmlTag("select");
var enumValues = Enum.GetValues(er.Accessor.PropertyType);
foreach (var enumValue in enumValues)
{
tag.Children.Add(new HtmlTag("option").Text(enumValue.ToString()));
}
return tag;
});
The FubuMVC.Recipes repository is quite new and still growing so there may be some better examples around but hope this gives you some ideas.

How to check if dom has a class using WebDriver (Selenium 2)?

I am very new to Selenium, so my apologies if it's a silly question.
I have successfully wired up IntelliJ (Play! framework) with Selenium, and created some tests using firefoxDrivers.
I'm trying to check if the page had been validated properly.
long story short, I'm selecting an element like this:
WebElement smallDecel = firefoxDriver.findElement(By.cssSelector("#configTable tr:nth-child(2) td .playerDecelInput"));
I do some further operations (clear and change the value, submit the 'form'), and then I want to check if the TD the input sits in was given another class.
So, the question is - is there a simple technique I can use to find out if a WebElement / DOM has a class specified?
To expand on Sam Woods' answer, I use a simple extension method (this is for C#) to test whether or not an element has a specified class:
public static bool HasClass( this IWebElement el, string className ) {
return el.GetAttribute( "class" ).Split( ' ' ).Contains( className );
}
Once you find the element, you can just call myElement.GetAttribute("class"). Then you can parse the string that is returned and see if it contains or does not contain the class name you care about.
You can use FindElement(By.ClassName(//name of your class)); I would recommend that you either loop through and search the DOM for a set period of time or set a Thread.sleep(xxxx) and then look for the appended class.

Irony: Tutorial on evaluating AST nodes?

I've defined a simple grammar in Irony, and generated a nice compact AST.
Now I'm trying to figure out how to evaluate it. Problem is, I can't find any tutorials on how to do this.
I've defined just 2 AST nodes:
class TagListNode : AstNode
{
public override void Init(ParsingContext context, ParseTreeNode treeNode)
{
base.Init(context, treeNode);
AsString = "TagList";
foreach (var node in treeNode.ChildNodes)
AddChild(null, node);
}
public override void EvaluateNode(Irony.Interpreter.EvaluationContext context, AstMode mode)
{
foreach (var node in ChildNodes)
node.EvaluateNode(context, AstMode.Read);
}
}
class TagBlockNode : AstNode
{
public AstNode Content;
public override void Init(ParsingContext context,ParseTreeNode treeNode)
{
base.Init(context, treeNode);
AsString = treeNode.ChildNodes[0].FindTokenAndGetText();
Content = AddChild(null, treeNode.ChildNodes[1]);
}
public override void EvaluateNode(EvaluationContext context, AstMode mode)
{
context.Write(string.Format("<{0}>", AsString));
Content.EvaluateNode(context, AstMode.Read);
context.Write(string.Format("</{0}>", AsString));
}
}
This will generate the following output:
<html><head><title></title></head><body><h1></h1><p></p><p></p></body></html>3.14159265358979
Whereas the output I want is:
<html>
<head>
<title>page title</title>
</head>
<body>
<h1>header</h1>
<p>paragraph 1</p>
<p>3.14159265358979</p>
</body>
</html>
I don't think I'm supposed to be using Context.Write(). The samples show pushing stuff onto context.Data and popping them off... but I'm not quite sure how that works.
I'm guessing pi gets tacked on at the end because it's automatically pushed onto context.Data and then one element is popped off at the end?? I'm not really sure.
Some pointers or a link to a tutorial would be nice.
Also, how am I supposed to handle the different node types? Each "Tag" can have 4 different types of content: another tag, a string literal, a variable, or a number. Should I be writing things like if(node is StringLiteral) .... in the EvaluateNode method or what?
I've found this one but they just loop over the AST and don't take advantage of EvaluateNode.
And then this one which replaces a single value in the data stack...but doesn't really explain how this gets outputted or anything.
To be clear, I specifically want to know how to override the EvaluateNode methods in Irony.Ast.AstNode to do what I want.
Okay, I've traced that tidbit at the end to this line:
if (EvaluationContext.HasLastResult)
EvaluationContext.Write(EvaluationContext.LastResult + Environment.NewLine);
Which is included in the default evaluation routine....perhaps it works well for a calculator app, but not so much in mine. Trying to figure out how to bypass the script interpreter now, but then I don't know how to set the globals.
The best way to iterate through an AST structure is to implement the visitor pattern.
Maybe this link helps you.

Categories