How can I get all HTML attributes with GeckoFX/C# - c#

In C# viaGeckoFx, I have not found a method to find all attributes of an element.
To do this, I made ​​a JavaScript function. Here is my code
GeckoWebBrowser GeckoBrowser = ....;
GeckoNode NodeElement = ....; // HTML element where to find all HTML attributes
string JSresult = "";
string JStext = #"
function getElementAttributes(element)
{
var AttributesAssocArray = {};
for (var index = 0; index < element.attributes.length; ++index) { AttributesAssocArray[element.attributes[index].name] = element.attributes[index].value; };
return JSON.stringify(AttributesAssocArray);
}
getElementAttributes(this);
";
using (AutoJSContext JScontext = new AutoJSContext(GeckoBrowser.Window.JSContext)) { JScontext.EvaluateScript(JStext, (nsISupports)NodeElement.DomObject, out JSresult); }
Do you have others suggestions to achieve this in C# (with no Javascript)?

The property GeckoElement.Attributes allows access to an elements attributes.
So for example (this is untested and uncompiled code):
public string GetElementAttributes(GeckoElement element)
{
var result = new StringBuilder();
foreach(var a in element.Attributes)
{
result.Append(String.Format(" {0} = '{1}' ", a.NodeName, a.NodeValue));
}
return result.ToString();
}

Related

Roslyn Rename variable const in Majusucle

Trying to convert that:
const string maj = "variable";
in
const string MAJ = "variable";
I'm using a Diagnostic with CodeFix.
I've already done the Diagnostic:
var localDeclarationConst = node as LocalDeclarationStatementSyntax;
if (localDeclarationConst != null &&
localDeclarationConst.Modifiers.Any(SyntaxKind.ConstKeyword)
)
{
foreach (VariableDeclaratorSyntax variable in localDeclarationConst.Declaration.Variables)
{
var symbol = model.GetDeclaredSymbol(variable);
if (symbol != null)
{
string varName = symbol.Name;
if (!varName.Equals(varName.ToUpper()))
{
addDiagnostic(Diagnostic.Create(Rule, localDeclarationConst.GetLocation(), "Les constantes doivent être en majusucle"));
}
}
}
}
But I cannot find a way for the CodeFix. Here is what I already wrote:
if (token.IsKind(SyntaxKind.ConstKeyword))
{
var ConstClause = (LocalDeclarationStatementSyntax)token.Parent;
var test = ConstClause.GetText();
var newConstClause = ConstClause.With //What with this With ??
var newRoot = root.ReplaceNode(ConstClause, newConstClause);
return new[] { CodeAction.Create("Mettre en maj", document.WithSyntaxRoot(newRoot)) };
}
As you can see, I'm looking for something that I can use with the .With
Edit:
So, I begin to understand how it works. But there is a point that I cannot know how it works. Let me explain:
if (token.IsKind(SyntaxKind.ConstKeyword))
{
var ConstClause = (VariableDeclaratorSyntax)token.Parent;
var test = ConstClause.Identifier.Text;
var newConstClause = ConstClause.ReplaceToken(SyntaxFactory.Identifier(test), SyntaxFactory.Identifier(test.ToUpperInvariant()));
var newRoot = root.ReplaceNode(ConstClause, newConstClause);
return new[] { CodeAction.Create("Make upper", document.WithSyntaxRoot(newRoot)) };
}
Here it's what I've done. To acces to the name of the variable (ConstClause.Identifier.Text) I use a VariableDeclaratorSyntax instead of the LocalDeclarationStatementSyntax.
But it doesn't work. What does I have to use??
It will be very helpful, because I will know how to change the name of my variables. And I need that.
Try ReplaceToken() instead of a With method.
Also, in your diagnostic, you could just use VariableDeclarator.Identifier instead of forcing the symbol to be created with GetDeclaredSymbol.
Okey, I'll find a way a now it works!
Here is the Diagnostic:
var localDeclarationConst = node as LocalDeclarationStatementSyntax;
if (localDeclarationConst != null &&
localDeclarationConst.Modifiers.Any(SyntaxKind.ConstKeyword)
)
{
foreach (VariableDeclaratorSyntax variable in localDeclarationConst.Declaration.Variables)
{
string varName = variable.Identifier.Text;
if (!varName.Equals(varName.ToUpper()))
{
addDiagnostic(Diagnostic.Create(Rule, variable.GetLocation(), "Les constantes doivent être en majusucle"));
}
}
And here is the CodeFix:
var root = await document.GetSyntaxRootAsync(cancellationToken); (root)
var token = root.FindToken(span.Start);
var node = root.FindNode(span);
if (node.IsKind(SyntaxKind.VariableDeclarator))
{
if (token.IsKind(SyntaxKind.IdentifierToken))
{
var variable = (VariableDeclaratorSyntax)node;
string newName = variable.Identifier.ValueText;
string NameDone = String.Empty;
for (int i = 0; i < newName.Length; i++)
{
NameDone = NameDone.ToString() + char.ToUpper(newName[i]);
}
var leading = variable.Identifier.LeadingTrivia;
var trailing = variable.Identifier.TrailingTrivia;
VariableDeclaratorSyntax newVariable = variable.WithIdentifier(SyntaxFactory.Identifier(leading, NameDone, trailing));
var newRoot = root.ReplaceNode(variable, newVariable);
return new[] { CodeAction.Create("Make upper", document.WithSyntaxRoot(newRoot)) };
}
}
If something looks wrong tell me, but I tried it and it works!

OptionOutputOriginalCase not working in HtmlAgilityPack

I am trying to replace some text using HtmlAgilityPack in Html string and placing ASP.net user controls but I am getting lower case in output html. Any Idea how to get original case output.
Code :
public static string ConvertPageTitlesToCMSTitle(string htmlstring, string themeSlug)
{
var htmlDoc = new HtmlAgilityPack.HtmlDocument()
{
OptionOutputOriginalCase = true,
OptionWriteEmptyNodes = true
};
htmlDoc.LoadHtml(htmlstring);
var stPageTitleTags = htmlDoc.DocumentNode.SelectNodes("//stpagetitle");
foreach (var stPageTitleTag in stPageTitleTags)
{
var pageTitle = Strings.StripHTML(stPageTitleTag.InnerText);
pageTitle = pageTitle.Trim();
var pageId = CreateUpdateContentPageInDb(pageTitle, themeSlug, null, null);
var widgetControl = string.Format("<widget:PageTitleDisplay runat=\"server\" PageId=\"{0}\" Editable=\"True\" />", pageId);
htmlDoc.DocumentNode.InnerHtml = htmlDoc.DocumentNode.InnerHtml.Replace(stPageTitleTag.OuterHtml, widgetControl);
}
return htmlDoc.DocumentNode.OuterHtml;
}
As a workaround you could create a text node instead of HTML node. See:
foreach (var stPageTitleTag in stPageTitleTags)
{
var pageTitle = Strings.StripHTML(stPageTitleTag.InnerText);
pageTitle = pageTitle.Trim();
var pageId = CreateUpdateContentPageInDb(pageTitle, themeSlug, null, null);
var widgetControl = string.Format("<widget:PageTitleDisplay runat=\"server\" PageId=\"{0}\" Editable=\"True\" />", pageId);
// creating a text node
var widget = htmlDoc.CreateTextNode(widgetControl);
// replacing <sppagetitle> node with the new one
stPageTitleTag.ReplaceChild(widget, stPageTitleTag);
}
This should get the output you want.

C# HtmlDecode Specific tags only

I have a large htmlencoded string and i want decode only specific whitelisted html tags.
Is there a way to do this in c#, WebUtility.HtmlDecode() decodes everything.
`I am looking for an implementaiton of DecodeSpecificTags() that will pass below test.
[Test]
public void DecodeSpecificTags_SimpleInput_True()
{
string input = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
string output = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
List<string> whiteList = new List<string>(){ "strong","br" } ;
Assert.IsTrue(DecodeSpecificTags(whiteList,input) == output);
}`
You could do something like this
public string DecodeSpecificTags(List<string> whiteListedTagNames,string encodedInput)
{
String regex="";
foreach(string s in whiteListedTagNames)
{
regex="<"+#"\s*/?\s*"+s+".*?"+">";
encodedInput=Regex.Replace(encodedInput,regex);
}
return encodedInput;
}
A better approach could be to use some html parser like Agilitypack or csquery or Nsoup to find specific elements and decode it in a loop.
check this for links and examples of parsers
Check It, i did it using csquery :
string input = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
string output = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
var decoded = HttpUtility.HtmlDecode(output);
var encoded =input ; // HttpUtility.HtmlEncode(decoded);
Console.WriteLine(encoded);
Console.WriteLine(decoded);
var doc=CsQuery.CQ.CreateDocument(decoded);
var paras=doc.Select("strong").Union(doc.Select ("br")) ;
var tags=new List<KeyValuePair<string, string>>();
var counter=0;
foreach (var element in paras)
{
HttpUtility.HtmlEncode(element.OuterHTML).Dump();
var key ="---" + counter + "---";
var value= HttpUtility.HtmlDecode(element.OuterHTML);
var pair= new KeyValuePair<String,String>(key,value);
element.OuterHTML = key ;
tags.Add(pair);
counter++;
}
var finalstring= HttpUtility.HtmlEncode(doc.Document.Body.InnerHTML);
finalstring.Dump();
foreach (var element in tags)
{
finalstring=finalstring.Replace(element.Key,element.Value);
}
Console.WriteLine(finalstring);
Or you could use HtmlAgility with a black list or white list based on your requirement. I'm using black listed approach.
My black listed tag is store in a text file, for example "script|img"
public static string DecodeSpecificTags(this string content, List<string> blackListedTags)
{
if (string.IsNullOrEmpty(content))
{
return content;
}
blackListedTags = blackListedTags.Select(t => t.ToLowerInvariant()).ToList();
var decodedContent = HttpUtility.HtmlDecode(content);
var document = new HtmlDocument();
document.LoadHtml(decodedContent);
decodedContent = blackListedTags.Select(blackListedTag => document.DocumentNode.Descendants(blackListedTag))
.Aggregate(decodedContent,
(current1, nodes) =>
nodes.Select(htmlNode => htmlNode.WriteTo())
.Aggregate(current1,
(current, nodeContent) =>
current.Replace(nodeContent, HttpUtility.HtmlEncode(nodeContent))));
return decodedContent;
}

Remove HTML from string

I am trying to clear the HTML coding from my RSS feed. I can not work out how to set the below to take out the HTML encoding.
var rssFeed = XElement.Parse(e.Result);
var currentFeed = this.DataContext as app.ViewModels.FeedViewModel;
var items = from item in rssFeed.Descendants("item")
select new ATP_Tennis_App.ViewModels.FeedItemViewModel()
{
Title = item.Element("title").Value,
DatePublished = DateTime.Parse(item.Element("pubDate").Value),
Url = item.Element("link").Value,
Description = item.Element("description").Value
};
foreach (var item in items)
currentFeed.Items.Add(item);
Just use the following code:
var withHtml = "<p>hello <b>there</b></p>";
var withoutHtml = Regex.Replace(withHtml, "<.+?>", string.Empty);
This will clean the html leaving only the text, so "hello there"
So, you can just copy and use this function:
string RemoveHtmlTags(string html) {
return Regex.Replace(html, "<.+?>", string.Empty);
}
Your code will look something like this:
var rssFeed = XElement.Parse(e.Result);
var currentFeed = this.DataContext as app.ViewModels.FeedViewModel;
var items = from item in rssFeed.Descendants("item")
select new ATP_Tennis_App.ViewModels.FeedItemViewModel()
{
Title = RemoveHtmlTags(item.Element("title").Value),
DatePublished = DateTime.Parse(item.Element("pubDate").Value),
Url = item.Element("link").Value,
Description = RemoveHtml(item.Element("description").Value)
};
You can use this code sample, it works fine on my side
public static string RemoveHTMLTags(string value)
{
string step1 = Regex.Replace(value, "<[^>]*>", " ");
string step2 = HttpUtility.HtmlDecode(step1);
return step2;
}
I hope, this code helps you.
Use the following class utility:
HttpUtility.HtmlDecode(string);
Please don't refer this answer no more.

Formatting Twitter text (TweetText) with C#

Is there a better way to format text from Twitter to link the hyperlinks, username and hashtags? What I have is working but I know this could be done better. I am interested in alternative techniques. I am setting this up as a HTML Helper for ASP.NET MVC.
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Web;
using System.Web.Mvc;
namespace Acme.Mvc.Extensions
{
public static class MvcExtensions
{
const string ScreenNamePattern = #"#([A-Za-z0-9\-_&;]+)";
const string HashTagPattern = #"#([A-Za-z0-9\-_&;]+)";
const string HyperLinkPattern = #"(http://\S+)\s?";
public static string TweetText(this HtmlHelper helper, string text)
{
return FormatTweetText(text);
}
public static string FormatTweetText(string text)
{
string result = text;
if (result.Contains("http://"))
{
var links = new List<string>();
foreach (Match match in Regex.Matches(result, HyperLinkPattern))
{
var url = match.Groups[1].Value;
if (!links.Contains(url))
{
links.Add(url);
result = result.Replace(url, String.Format("{0}", url));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, ScreenNamePattern))
{
var screenName = match.Groups[1].Value;
if (!names.Contains(screenName))
{
names.Add(screenName);
result = result.Replace("#" + screenName,
String.Format("#{0}", screenName));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, HashTagPattern))
{
var hashTag = match.Groups[1].Value;
if (!names.Contains(hashTag))
{
names.Add(hashTag);
result = result.Replace("#" + hashTag,
String.Format("#{1}",
HttpUtility.UrlEncode("#" + hashTag), hashTag));
}
}
}
return result;
}
}
}
That is remarkably similar to the code I wrote that displays my Twitter status on my blog. The only further things I do that I do are
1) looking up #name and replacing it with Real Name;
2) multiple #name's in a row get commas, if they don't have them;
3) Tweets that start with #name(s) are formatted "To #name:".
I don't see any reason this can't be an effective way to parse a tweet - they are a very consistent format (good for regex) and in most situations the speed (milliseconds) is more than acceptable.
Edit:
Here is the code for my Tweet parser. It's a bit too long to put in a Stack Overflow answer. It takes a tweet like:
#user1 #user2 check out this cool link I got from #user3: http://url.com/page.htm#anchor #coollinks
And turns it into:
<span class="salutation">
To Real Name,
Real Name:
</span> check out this cool link I got from
<span class="salutation">
Real Name
</span>:
http://site.com/...
#coollinks
It also wraps all that markup in a little JavaScript:
document.getElementById('twitter').innerHTML = '{markup}';
This is so the tweet fetcher can run asynchronously as a JS and if Twitter is down or slow it won't affect my site's page load time.
I created helper method to shorten text to 140 chars with url included. You can set share length to 0 to exclude url from tweet.
public static string FormatTwitterText(this string text, string shareurl)
{
if (string.IsNullOrEmpty(text))
return string.Empty;
string finaltext = string.Empty;
string sharepath = string.Format("http://url.com/{0}", shareurl);
//list of all words, trimmed and new space removed
List<string> textlist = text.Split(' ').Select(txt => Regex.Replace(txt, #"\n", "").Trim())
.Where(formatedtxt => !string.IsNullOrEmpty(formatedtxt))
.ToList();
int extraChars = 3; //to account for the two dots ".."
int finalLength = 140 - sharepath.Length - extraChars;
int runningLengthCount = 0;
int collectionCount = textlist.Count;
int count = 0;
foreach (string eachwordformated in textlist
.Select(eachword => string.Format("{0} ", eachword)))
{
count++;
int textlength = eachwordformated.Length;
runningLengthCount += textlength;
int nextcount = count + 1;
var nextTextlength = nextcount < collectionCount ?
textlist[nextcount].Length :
0;
if (runningLengthCount + nextTextlength < finalLength)
finaltext += eachwordformated;
}
return runningLengthCount > finalLength ? finaltext.Trim() + ".." : finaltext.Trim();
}
There is a good resource for parsing Twitter messages this link, worked for me:
How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0
http://jes.al/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/
It contains support for:
Urls
#hashtags
#usernames
BTW: Regex in the ParseURL() method needs reviewing, it parses stock symbols (BARC.L) into links.

Categories