C# Replace Emoticons In Html String - c#

I have the following method for replacing emoticons in a string using C#
public static string Emotify(string inputText)
{
var emoticonFolder = EmoticonFolder;
var emoticons = new Hashtable(100)
{
{":)", "facebook-smiley-face-for-comments.png"},
{":D", "big-smile-emoticon-for-facebook.png"},
{":(", "facebook-frown-emoticon.png"},
{":'(", "facebook-cry-emoticon-crying-symbol.png"},
{":P", "facebook-tongue-out-emoticon.png"},
{"O:)", "angel-emoticon.png"},
{"3:)", "devil-emoticon.png"},
{":/", "unsure-emoticon.png"},
{">:O", "angry-emoticon.png"},
{":O", "surprised-emoticon.png"},
{"-_-", "squinting-emoticon.png"},
{":*", "kiss-emoticon.png"},
{"^_^", "kiki-emoticon.png"},
{">:(", "grumpy-emoticon.png"},
{":v", "pacman-emoticon.png"},
{":3", "curly-lips-emoticon.png"},
{"o.O", "confused-emoticon-wtf-symbol-for-facebook.png"},
{";)", "wink-emoticon.png"},
{"8-)", "glasses-emoticon.png"},
{"8| B|", "sunglasses-emoticon.png"}
};
var sb = new StringBuilder(inputText.Length);
for (var i = 0; i < inputText.Length; i++)
{
var strEmote = string.Empty;
foreach (string emote in emoticons.Keys)
{
if (inputText.Length - i >= emote.Length && emote.Equals(inputText.Substring(i, emote.Length), StringComparison.InvariantCultureIgnoreCase))
{
strEmote = emote;
break;
}
}
if (strEmote.Length != 0)
{
sb.AppendFormat("<img src=\"{0}{1}\" alt=\"\" class=\"emoticon\" />", emoticonFolder, emoticons[strEmote]);
i += strEmote.Length - 1;
}
else
{
sb.Append(inputText[i]);
}
}
return sb.ToString();
}
It works great and 'seems' pretty fast, however I realised a slight problem with Html.
This method breaks pages with a link in them because of the..
:/
emoticon. It breaks the
http://
By sticking an image in the middle. I'm trying to figure out a way to adapt this method to take into account links and ignore them - But without sacrificing performance.
Any help or pointers greatly appreciated.

HTML agility pack and regex will be your friend here. You could have a decorator where your decorations build up the src?. Can we have an example of the src that causes the issue? :)

Related

CefSharp - Get Value of HTML Element

How can I get the value of an HTML element with CefSharp?
I know how to do with this default WebBrowser Control:
Dim Elem As HtmlElement = WebBrowser1.Document.GetElementByID("id")
But I didn't find anything similar for CefSharp. The main reason I am using CefSharp is because part of the website is using iframes to store the source and default WebBrowser doesn't support it. Also, does CefSharp have an option to InvokeMember or similar call?
I'm using the latest release of CefSharp by the way.
There is a really good example of how to do this in their FAQ.
https://github.com/cefsharp/CefSharp/wiki/Frequently-asked-questions#2-how-do-you-call-a-javascript-method-that-return-a-result
Here is the code for the lazy. Pretty self explanatory and it worked well for me.
string script = string.Format("document.getElementById('startMonth').value;");
browser.EvaluateScriptAsync(script).ContinueWith(x =>
{
var response = x.Result;
if (response.Success && response.Result != null)
{
var startDate = response.Result;
//startDate is the value of a HTML element.
}
});
this is the only way that worked for me, version 57.0.0.0..
((CefSharp.Wpf.ChromiumWebBrowser)chromeBrowser).FrameLoadEnd += Browser_FrameLoadEnd;
....
async void Browser_FrameLoadEnd(object sender, CefSharp.FrameLoadEndEventArgs e)
{
Console.WriteLine("cef-"+e.Url);
if (e.Frame.IsMain)
{
string HTML = await e.Frame.GetSourceAsync();
Console.WriteLine(HTML);
}
}
This worked for me. You can modify it by yourself.
private async void TEST()
{
string script = "document.getElementsByClassName('glass')[0]['firstElementChild']['firstChild']['wholeText']";
JavascriptResponse response = await browser.EvaluateScriptAsync(script);
label1.Text = response.Result.ToString();
}
Maybe this can do your job.
private async void TEST()
{
string script = "Document.GetElementByID('id').value";
JavascriptResponse response = await browser.EvaluateScriptAsync(script);
string resultS = response.Result.ToString(); // whatever you need
}
With CefSharp,you can get elements' value by javascript.
For example,
m_browser.ExecuteScriptAsync("document.GetElementById('id1');");
About javascript,you can learn it from w3s.
And I think you should read this passage.
Have fun.
string script = #"document.getElementById('id_element').style;";
browser.EvaluateScriptAsync(script).ContinueWith(x=> {
var response = x.Result;
if (response.Success && response.Result != null)
{
System.Dynamic.ExpandoObject abc = (System.Dynamic.ExpandoObject)response.Result;
foreach (KeyValuePair<string,object> item in abc)
{
string key = item.Key.ToString();
string value = item.Value.ToString();
}
}
});
It working for me.

Trouble with direct download links from Mediafire

I had trouble downloading a file from Mediafire. I found out the I have to use their API. I found another SO question: "Get direct download link and file site from Mediafire.com"
With the help of the shown functions I created the following class:
class Program
{
static void Main(string[] args)
{
Mediafireclass mf = new Mediafireclass();
WebClient webClient = new WebClient();
mf.Mediafiredownload("somemediafirelink/test.txt");
webClient.DownloadFileAsync(new Uri("somemediafirelink/test.txt"), #"location to save/test.txt");
}
}
and used the function by T3KBAU5 like this:
internal class Mediafireclass
{
public string Mediafiredownload(string download)
{
HttpWebRequest req;
HttpWebResponse res;
string str = "";
req = (HttpWebRequest)WebRequest.Create(download);
res = (HttpWebResponse)req.GetResponse();
str = new StreamReader(res.GetResponseStream()).ReadToEnd();
int indexurl = str.IndexOf("http://download");
int indexend = GetNextIndexOf('"', str, indexurl);
string direct = str.Substring(indexurl, indexend - indexurl);
return direct;
}
private int GetNextIndexOf(char c, string source, int start)
{
if (start < 0 || start > source.Length - 1)
{
throw new ArgumentOutOfRangeException();
}
for (int i = start; i < source.Length; i++)
{
if (source[i] == c)
{
return i;
}
}
return -1;
}
}
But when I run it this error pops up:
Screenshot of the Error
What can I do to solve the problem, and can you explain what this error means?
Firstly, the Mediafiredownload method returns a string, the direct download link, which you are not using. Your code should resemble:
Mediafireclass mf = new Mediafireclass();
WebClient webClient = new WebClient();
string directLink = mf.Mediafiredownload("somemediafirelink/test.txt");
webClient.DownloadFileAsync(new Uri(directLink), #"location to save/test.txt");
As for the exception it's firing, it's important to understand what the GetNextIndexOf method is doing - iterating through a string, source, to find the index of a character, c, after a certain start position, start. The first line in that method is checking that the start value is within the length of the source string, so that it doesn't immediately look at a character out of the range and throw an ArgumentOutOfRangeException. You need to set a breakpoint on this line:
int indexend = GetNextIndexOf('"', str, indexurl);
And look at the values of str and indexurl using the locals window. This will reveal the problem.
Also, the code you are using is almost 5 years old and I expect this problem is more to do with the fact that Mediafire will have changed the URL structure since then. Your code relies on the fact that the url contains "http://download" which may not be the case any more.
The easiest way is to use an dll file. Such as DirektDownloadLinkCatcher.
Or u have to query for the right div by the "download_link" class and the get the href of the containing tag. Thats the way how I solved it in these dll.^^
Or use the API from MediaFire.
Hoped I could help.

Analysing C# source with Irony

This is what my team and I chose to do for our school project. Well, actually we haven't decided on how to parse the C# source files yet.
What we are aiming to achieve is, perform a full analysis on a C# source file, and produce up a report.
In which the report is going to contain stuff that happening in the codes.
The report only has to contain:
string literals
method names
variable names
field names
etc
I'm in charge of looking into this Irony library. To be honest, I don't know the best way to sort the data out into a clean readable report. I am using the C# grammar class packed with the zip.
Is there any step where I can properly identify each node children? (eg: using directives, namespace declaration, class declaration etc, method body)
Any help or advice would be very much appreciated. Thanks.
EDIT: Sorry I forgot to say we need to analysis the method calls too.
Your main goal is to master the basics of formal languages. A good start-up might be found here. This article describes the way to use Irony on the sample of a grammar of a simple numeric calculator.
Suppose you want to parse a certain file containing C# code the path to which you know:
private void ParseForLongMethods(string path)
{
_parser = new Parser(new CSharpGrammar());
if (_parser == null || !_parser.Language.CanParse()) return;
_parseTree = null;
GC.Collect(); //to avoid disruption of perf times with occasional collections
_parser.Context.SetOption(ParseOptions.TraceParser, true);
try
{
string contents = File.ReadAllText(path);
_parser.Parse(contents);//, "<source>");
}
catch (Exception ex)
{
}
finally
{
_parseTree = _parser.Context.CurrentParseTree;
TraverseParseTree();
}
}
And here is the traversal method itself with counting some info in the nodes. Actually this code counts the number of statements in every method of the class. If you have any question you are always welcome to ask me
private void TraverseParseTree()
{
if (_parseTree == null) return;
ParseNodeRec(_parseTree.Root);
}
private void ParseNodeRec(ParseTreeNode node)
{
if (node == null) return;
string functionName = "";
if (node.ToString().CompareTo("class_declaration") == 0)
{
ParseTreeNode tmpNode = node.ChildNodes[2];
currentClass = tmpNode.AstNode.ToString();
}
if (node.ToString().CompareTo("method_declaration") == 0)
{
foreach (var child in node.ChildNodes)
{
if (child.ToString().CompareTo("qual_name_with_targs") == 0)
{
ParseTreeNode tmpNode = child.ChildNodes[0];
while (tmpNode.ChildNodes.Count != 0)
{ tmpNode = tmpNode.ChildNodes[0]; }
functionName = tmpNode.AstNode.ToString();
}
if (child.ToString().CompareTo("method_body") == 0) //method_declaration
{
int statementsCount = FindStatements(child);
//Register bad smell
if (statementsCount>(((LongMethodsOptions)this.Options).MaxMethodLength))
{
//function.StartPoint.Line
int functionLine = GetLine(functionName);
foundSmells.Add(new BadSmellRegistry(name, functionLine,currentFile,currentProject,currentSolution,false));
}
}
}
}
foreach (var child in node.ChildNodes)
{ ParseNodeRec(child); }
}
I'm not sure this is what you need but you could use the CodeDom and CodeDom.Compiler namespaces to compile the C# code, and than analyze the results using Reflection, something like:
// Create assamblly in Memory
CodeSnippetCompileUnit code = new CodeSnippetCompileUnit(classCode);
CSharpCodeProvider provider = new CSharpCodeProvider();
CompilerResults results = provider.CompileAssemblyFromDom(compileParams, code);
foreach(var type in results.CompiledAssembly)
{
// Your analysis go here
}
Update: In VS2015 you could use the new C# compiler (AKA Roslyn) to do the same, for example:
var root = (CompilationUnitSyntax)tree.GetRoot();
var compilation = CSharpCompilation.Create("HelloTDN")
.AddReferences(references: new[] { MetadataReference.CreateFromAssembly(typeof(object).Assembly) })
.AddSyntaxTrees(tree);
var model = compilation.GetSemanticModel(tree);
var nameInfo = model.GetSymbolInfo(root.Usings[0].Name);
var systemSymbol = (INamespaceSymbol)nameInfo.Symbol;
foreach (var ns in systemSymbol.GetNamespaceMembers())
{
Console.WriteLine(ns.Name);
}

Select elements added to the DOM by a script

I've been trying to get either an <object> or an <embed> tag using:
HtmlNode videoObjectNode = doc.DocumentNode.SelectSingleNode("//object");
HtmlNode videoEmbedNode = doc.DocumentNode.SelectSingleNode("//embed");
This doesn't seem to work.
Can anyone please tell me how to get these tags and their InnerHtml?
A YouTube embedded video looks like this:
<embed height="385" width="640" type="application/x-shockwave-flash"
src="http://s.ytimg.com/yt/swf/watch-vfl184368.swf" id="movie_player" flashvars="..."
allowscriptaccess="always" allowfullscreen="true" bgcolor="#000000">
I got a feeling the JavaScript might stop the swf player from working, hope not...
Cheers
Update 2010-08-26 (in response to OP's comment):
I think you're thinking about it the wrong way, Alex. Suppose I wrote some C# code that looked like this:
string codeBlock = "if (x == 1) Console.WriteLine(\"Hello, World!\");";
Now, if I wrote a C# parser, should it recognize the contents of the string literal above as C# code and highlight it (or whatever) as such? No, because in the context of a well-formed C# file, that text represents a string to which the codeBlock variable is being assigned.
Similarly, in the HTML on YouTube's pages, the <object> and <embed> elements are not really elements at all in the context of the current HTML document. They are the contents of string values residing within JavaScript code.
In fact, if HtmlAgilityPack did ignore this fact and attempted to recognize all portions of text that could be HTML, it still wouldn't succeed with these elements because, being inside JavaScript, they're heavily escaped with \ characters (notice the precarious Unescape method in the code I posted to get around this issue).
I'm not saying my hacky solution below is the right way to approach this problem; I'm just explaining why obtaining these elements isn't as straightforward as grabbing them with HtmlAgilityPack.
YouTubeScraper
OK, Alex: you asked for it, so here it is. Some truly hacky code to extract your precious <object> and <embed> elements out from that sea of JavaScript.
class YouTubeScraper
{
public HtmlNode FindObjectElement(string url)
{
HtmlNodeCollection scriptNodes = FindScriptNodes(url);
for (int i = 0; i < scriptNodes.Count; ++i)
{
HtmlNode scriptNode = scriptNodes[i];
string javascript = scriptNode.InnerHtml;
int objectNodeLocation = javascript.IndexOf("<object");
if (objectNodeLocation != -1)
{
string htmlStart = javascript.Substring(objectNodeLocation);
int objectNodeEndLocation = htmlStart.IndexOf(">\" :");
if (objectNodeEndLocation != -1)
{
string finalEscapedHtml = htmlStart.Substring(0, objectNodeEndLocation + 1);
string unescaped = Unescape(finalEscapedHtml);
var objectDoc = new HtmlDocument();
objectDoc.LoadHtml(unescaped);
HtmlNode objectNode = objectDoc.GetElementbyId("movie_player");
return objectNode;
}
}
}
return null;
}
public HtmlNode FindEmbedElement(string url)
{
HtmlNodeCollection scriptNodes = FindScriptNodes(url);
for (int i = 0; i < scriptNodes.Count; ++i)
{
HtmlNode scriptNode = scriptNodes[i];
string javascript = scriptNode.InnerHtml;
int approxEmbedNodeLocation = javascript.IndexOf("<\\/object>\" : \"<embed");
if (approxEmbedNodeLocation != -1)
{
string htmlStart = javascript.Substring(approxEmbedNodeLocation + 15);
int embedNodeEndLocation = htmlStart.IndexOf(">\";");
if (embedNodeEndLocation != -1)
{
string finalEscapedHtml = htmlStart.Substring(0, embedNodeEndLocation + 1);
string unescaped = Unescape(finalEscapedHtml);
var embedDoc = new HtmlDocument();
embedDoc.LoadHtml(unescaped);
HtmlNode videoEmbedNode = embedDoc.GetElementbyId("movie_player");
return videoEmbedNode;
}
}
}
return null;
}
protected HtmlNodeCollection FindScriptNodes(string url)
{
var doc = new HtmlDocument();
WebRequest request = WebRequest.Create(url);
using (var response = request.GetResponse())
using (var stream = response.GetResponseStream())
{
doc.Load(stream);
}
HtmlNode root = doc.DocumentNode;
HtmlNodeCollection scriptNodes = root.SelectNodes("//script");
return scriptNodes;
}
static string Unescape(string htmlFromJavascript)
{
// The JavaScript has escaped all of its HTML using backslashes. We need
// to reverse this.
// DISCLAIMER: I am a TOTAL Regex n00b; I make no claims as to the robustness
// of this code. If you could improve it, please, I beg of you to do so. Personally,
// I tested it on a grand total of three inputs. It worked for those, at least.
return Regex.Replace(htmlFromJavascript, #"\\(.)", UnescapeFromBeginning);
}
static string UnescapeFromBeginning(Match match)
{
string text = match.ToString();
if (text.StartsWith("\\"))
{
return text.Substring(1);
}
return text;
}
}
And in case you're interested, here's a little demo I threw together (super fancy, I know):
class Program
{
static void Main(string[] args)
{
var scraper = new YouTubeScraper();
HtmlNode davidAfterDentistEmbedNode = scraper.FindEmbedElement("http://www.youtube.com/watch?v=txqiwrbYGrs");
Console.WriteLine("David After Dentist:");
Console.WriteLine(davidAfterDentistEmbedNode.OuterHtml);
Console.WriteLine();
HtmlNode drunkHistoryObjectNode = scraper.FindObjectElement("http://www.youtube.com/watch?v=jL68NyCSi8o");
Console.WriteLine("Drunk History:");
Console.WriteLine(drunkHistoryObjectNode.OuterHtml);
Console.WriteLine();
HtmlNode jessicaDailyAffirmationEmbedNode = scraper.FindEmbedElement("http://www.youtube.com/watch?v=qR3rK0kZFkg");
Console.WriteLine("Jessica's Daily Affirmation:");
Console.WriteLine(jessicaDailyAffirmationEmbedNode.OuterHtml);
Console.WriteLine();
HtmlNode jazzerciseObjectNode = scraper.FindObjectElement("http://www.youtube.com/watch?v=VGOO8ZhWFR4");
Console.WriteLine("Jazzercise - Move your Boogie Body:");
Console.WriteLine(jazzerciseObjectNode.OuterHtml);
Console.WriteLine();
Console.Write("Finished! Hit Enter to quit.");
Console.ReadLine();
}
}
Original Answer
Why not try using the element's Id instead?
HtmlNode videoEmbedNode = doc.GetElementbyId("movie_player");
Update: Oh man, you're searching for HTML tags that are themselves within JavaScript? That's definitely why this isn't working. (They aren't really tags to be parsed from the perspective of HtmlAgilityPack; all of that JavaScript is really one big string inside a <script> tag.) Maybe there's some way you can parse the <script> tag's inner text itself as HTML and go from there.

Getting the parent name of a URI/URL from absolute name C#

Given an absolute URI/URL, I want to get a URI/URL which doesn't contain the leaf portion. For example: given http://foo.com/bar/baz.html, I should get http://foo.com/bar/.
The code which I could come up with seems a bit lengthy, so I'm wondering if there is a better way.
static string GetParentUriString(Uri uri)
{
StringBuilder parentName = new StringBuilder();
// Append the scheme: http, ftp etc.
parentName.Append(uri.Scheme);
// Appned the '://' after the http, ftp etc.
parentName.Append("://");
// Append the host name www.foo.com
parentName.Append(uri.Host);
// Append each segment except the last one. The last one is the
// leaf and we will ignore it.
for (int i = 0; i < uri.Segments.Length - 1; i++)
{
parentName.Append(uri.Segments[i]);
}
return parentName.ToString();
}
One would use the function something like this:
static void Main(string[] args)
{
Uri uri = new Uri("http://foo.com/bar/baz.html");
// Should return http://foo.com/bar/
string parentName = GetParentUriString(uri);
}
Thanks,
Rohit
Did you try this? Seems simple enough.
Uri parent = new Uri(uri, "..");
This is the shortest I can come up with:
static string GetParentUriString(Uri uri)
{
return uri.AbsoluteUri.Remove(uri.AbsoluteUri.Length - uri.Segments.Last().Length);
}
If you want to use the Last() method, you will have to include System.Linq.
There must be an easier way to do this with the built in uri methods but here is my twist on #unknown (yahoo)'s suggestion.
In this version you don't need System.Linq and it also handles URIs with query strings:
private static string GetParentUriString(Uri uri)
{
return uri.AbsoluteUri.Remove(uri.AbsoluteUri.Length - uri.Segments[uri.Segments.Length -1].Length - uri.Query.Length);
}
Quick and dirty
int pos = uriString.LastIndexOf('/');
if (pos > 0) { uriString = uriString.Substring(0, pos); }
Shortest way I found:
static Uri GetParent(Uri uri) {
return new Uri(uri, Path.GetDirectoryName(uri.LocalPath) + "/");
}
PapyRef's answer is incorrect, UriPartial.Path includes the filename.
new Uri(uri, ".").ToString()
seems to be cleanest/simplest implementation of the function requested.
I read many answers here but didn't find one that I liked because they break in some cases.
So, I am using this:
public Uri GetParentUri(Uri uri) {
var withoutQuery = new Uri(uri.GetComponents(UriComponents.Scheme |
UriComponents.UserInfo |
UriComponents.Host |
UriComponents.Port |
UriComponents.Path, UriFormat.UriEscaped));
var trimmed = new Uri(withoutQuery.AbsoluteUri.TrimEnd('/'));
var result = new Uri(trimmed, ".");
return result;
}
Note: It removes the Query and the Fragment intentionally.
new Uri(uri.AbsoluteUri + "/../")
Get segmenation of url
url="http://localhost:9572/School/Common/Admin/Default.aspx"
Dim name() As String = HttpContext.Current.Request.Url.Segments
now simply using for loop or by index, get parent directory name
code = name(2).Remove(name(2).IndexOf("/"))
This returns me, "Common"
Thought I'd chime in; despite it being almost 10 years, with the advent of the cloud, getting the parent Uri is a fairly common (and IMO more valuable) scenario, so combining some of the answers here you would simply use (extended) Uri semantics:
public static Uri Parent(this Uri uri)
{
return new Uri(uri.AbsoluteUri.Remove(uri.AbsoluteUri.Length - uri.Segments.Last().Length - uri.Query.Length).TrimEnd('/'));
}
var source = new Uri("https://foo.azure.com/bar/source/baz.html?q=1");
var parent = source.Parent(); // https://foo.azure.com/bar/source
var folder = parent.Segments.Last(); // source
I can't say I've tested every scenario, so caution advised.

Categories